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99 unclassified proteins (S. cerevisiae, YLR309c] 6e-20 

03.04 budding, ceil polarity and filament formation (S. cerevisiae, YHR023w 
1 isoform] 4e-19 . . 

03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 4e-19 

03.19 recombination and dna repair (S. cerevisiae, YNL250w] le-15 
1 genome replication, transcription, recombination and repair (M. 

MJ1322] 2e-14 . . _^ oc . 0 Afi 

30.13 organization of chromosome structure (S. cerevisiae, YDR285w] 2e-09 
09.04 biogenesis of cytoskeleton (S. cerevisiae, YKL179c] 3e-09 

09.13 biogenesis of- chromosome structure (S. cerevisiae, YLR086w] 2e-07 
03.01 cell growth [S. cerevisiae, YNL079c] 2e-07 

08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 



cerevisiae, YGL086w) le-06 
(S. cerevisiae, YHR158c] 



03.22.01 cell cycle check point proteins [S. 
10.05.99 other pheromone response activities 

04.05.01.04 transcriptional control (S. cerevisiae, YDR217c] 4e-06 
98 classification not yet clear-cut [S. cerevisiae, YJRl34c] 2e-05 
05.04 translation (initiation, elongation and termination) (S. cerevisiae, 

r general function prediction [M. jannaschii, MJ1254] 0.001 

BL00387A 

BL00411H 

BL00411G 

BL00411F 

BL00411E Kinesin motor domain proteins 
BL00411D Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 

d2kin.l 3.29.1.5.3 Kinesin [Rat [Rattus norvegicus) 2e-68 

d2tmab 1.105.4.1.1 Tropomyosin [rabbit (Oryctolagus cuniculus) 4e-05 

d3kar 3.29.1.5.4 Kinesin [Baker's yeast (Saccharomyce 2e-09 

3.6.1.32 Myosin ATPase 5e-25 
nucleus 4e-27 
phosphotransferase 3e-i6 
duplication' 6e-20 
citrulline 6e-18 
tandem repeat 4e-24 
heterodimer 3e-28 
endocytosis le-23 
heart le-17 • 

transmembrane protein 2e-28 

serine/threonine-specif ic protein kinase 3e-16 

zinc finger le-23 

surface antigen 2e-16 

DNA binding le-25 

metal binding le-23 

muscle contraction 4e-24 

heterotetramer 4e-24 

acetylated amino end 2e-19 

actin binding 5e-25 

mitosis 3e-58 

microtubule binding 3e-58 

ATP 3e-58 

thick filament 4e-24 

phosphoprotein 9e-29 

leucine zipper le-12 

skeletal muscle 8e-24 

disulfide bond le-12 

heterotrimer le-29 

calcium binding 6e-18 

alternative splicing 4e-21 

P-loop 2e-63 

coiled coil 3e-58 

heptad repeat le-25 

methylated amino acid 4e-24 

peripheral membrane protein le-23 

dimer le-12 

cardiac muscle le-17 

hydrolase 5e-25 

microtubule 6e-15 

muscle 7e-23 

membrane protein 6e-20 

GTP binding 8e-22 

EF hand 6e-18 

cell division le-25 

cytoskeleton 4e-24 

hair 6e-18 

Golgi apparatus 8e-24 
calmodulin binding le-23 
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[SUPFAM] unassigned Ser/Thr or Tyr-specific protein kinases 3e-16 

(SUPFAM] myosin motor domain homology 5e-25 

[SUPFAM] alpha-actinin actin-binding domain homology le-13 

[SUPFAM] kinesin-related protein KIP1 9e-27 

(SUPFAM] kinesin-related protein CIN8 4e-36 

(SUPFAM) kinesin heavy chain 4e-24 

(SUPFAM] plectin le-13 

(SUPFAM} trichohyalin 6e-18 

(SUPFAM] kinesin-related protein KIF3 le-29 

[SUPFAM] kinesin-related protein KIF2 3e-20 

[SUPFAM] ribosomal protein S10 homology le-13 

(SUPFAM] giantin 8e-24 

(SUPFAM] protein kinase homology 3e-16 

(SUPFAM] protein kinase C zinc-binding repeat homology 2e-13 

[SUPFAM] kinesin-related protein unc-104 8e-26 

[SUPFAM] human early endosome antigen 1 le-23 

[SUPFAM] unassigned kinesin-related proteins le-28 

[SUPFAM] Mycoplasma genitalium hypothetical protein MG218 4e-17 

[SUPFAM] myosin heavy chain 5e-25 

[SUPFAM] conserved hypothetical P115 protein 4e-20 

(SUPFAM] centromere protein E 5e-24 

[SUPFAM] calmodulin repeat homology 6e-18 

[SUPFAM] kinesin-related protein KLP61F le-25 

[SUPFAM] hypothetical protein MJ0914 3e-12 

[SUPFAM] kinesin-related protein MKLP-1 2e-63 

(SUPFAM] pleckstrin repeat homology 8e-26 

[SUPFAM] hypothetical protein MJ1322 4e-13 

[SUPFAM] kinesin-related protein KIF1B 3e-28 

[SUPFAM] kinesin motor domain homology 2e-63 

[SUPFAM] kinesin-related protein KLPA 7e-25 

(SUPFAM) kinesin-related protein nodA le-12 

[SUPFAM] kinesin-related protein Eg5 5e-30 

[PROSITE] A?P_GTPJV 1 

[PFAM] Kinesin motor domain 

[KW] irregular 

(KW] 30 

(KW] LOW_COMPLEXITY 7.53 % 

[KW] COILED_COIL 19.78 % 



SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 
SEG 
COILS 
3kar- 

SEQ 



MESNFNQEGVPRPSYVFSADPIARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQ 
VCLRI RPFTQSEKELESEGCVHILDSQTWLKEPQCILGRLSEKSSGQMAQKFSFSKVFG 
PATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLF 
DSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQIKEVTVHNDSDDTLYGSL 

TKSLNISEFEESIKDYEQANLNMANSIKFSVWVSFFEIYNEYIYDLFVPVSSKFQKRKML 

EEEEEEEEEEETTEEEETTTCC CCEE 

RLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKIL 

EEETTTTE-EEEETTCCEEECCG^ 

QIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNS 

E 1 - EETTTTC EEEEEEEEE ECCCCCCC CCCHHHHHHHHHHHHHHHHHHHHHHHHTT 

EKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDETLNVLKFSAIAQKVC 

TTTT- Itccttttthhhhhhgggctttteeeeeeeecccggghhhhhhhhhhhh 

VPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSLEDLMEDEDLVEELE 
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SEG xxxxxxxxxxxxxxxxxx 

COILS 

3kar- 

SEQ NAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEKLTLEFK 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx . . 

C0ILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- ! ! 1 ! ! 

SEQ I REEVTQE FTQYW AQRE ADFKET LLQEREI LEENAERRLAI FKDLVGKCDTREEAAK DI C 

SEG 

COILS CCCCCCC 

3kar- 

SEQ ATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENESDSLIQELETSNKKII 

SEG ' 

C0ILS ccccccccccccccc 

3kar- 

SEQ TQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSSLIINNKLICNETVEVP 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ KDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKSEEVRPNIAEIEDIRVL 

SEG 

COILS - CCCC 



3kar- 



SEQ QENNEGLRAFLLT I ENELKNEKEEKAELNKQI VHFQQELSLSEKKNLTLSKEVQQIQSNY 

SEG xxxxxxxxxxxxxxxx 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ DIAIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDELRTLDSVSQ 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ ISNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAI 

SEG 

COILS 

3kar- 

SEQ WEECKEI VKASSKKSHQI EELEQQI EKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETL 

SEG xxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IQQLKEELQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHS 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ AKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMK 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ HLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIK 

SEG . xxxxxxxxxxxxxxxxxxx 

COILS CCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ QVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQY 

SEG 

COILS CCCCCCCCCCCC 

3kar- 

SEQ ERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLE 

SEG xxxxxxxxxxxxxxxxx 

C0I LS CCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ TKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIR 

SEG 

COILS CC 

3kar- 

S EQ NKEMKK Y AEDRERFFKQQNEMEI LT AQLTEKDS DLQKWREERDQLV AALEI QLKAL ISSN 

SEG 
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COILS 

3kar- 

SEQ VQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGS 

SEG 

COILS 

3kar- 

SEQ WLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNE 

SEG 

COILS 

3kar- 

SEQ MEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVN 

SEG 

COILS 

3kar- 

SEQ LATKKKEGTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKL 

SEG 

COILS 

3kar- 

SEQ YTSEISSPIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 

SEG 

COILS 

3kar- 



Proaite for DKFZphtes3_35b4 . 3 



PS00017 152->160 ATP_GTP_A PDOC00017 



Pfara for DKFZphtes3_35b4 . 3 
HMM_NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds phks 

R+RP+ + E++ + +V + ++++ ++ + 
Query 64 RIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQK 112 

HMM FtFDHVFWWncTQedVYdtvAHPIVDDcFhGYNCTI FAYGQTGSGKTYTM 

F+F +VF++++TQ++ +++ + V+D+++G IF+YG T SGKTYT 
Query 113 FSFSKVFGPATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTF 162 

HMM MGpggehPDHmGIIPRcCHDIFdrldkfqekDhdFW 

G +++GI+PR+++ +FD++ + +++ 

Query 163 QG TEENIGILPRTLNVLFDSLQERL-YTKMNLKPHRSREYLRLSSE 20*7 

HMM 

Query 208 QEKEEI ASKSALLRQI KEVTVHNDSDDTL YGSLTNS LN I S E FEES I KD YE 257 

HMM hVkCSYMEIYNEel YDLLCPnP . . . qhMkpLnlHEHPN 

+V +S++EIYNE+IYDL +P++ Q++K L++ + + 
Query 258 QANLNMANS I KFSVWVS FFEI YNE Y I YDLFVPVSSKFQKRKMLRLSQDVK 307 

HMM MGpYVqGCTEf HVcSYeDachWIWqGnknRHVAaTnMNdhSSRSHtlFTI 

++++++ v +A +++ +G K+ VA T++N SSRSH+IFT+ 
Query 308 GYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTV 357 

HMM HVeQrHk . qcdehvcHSKMNLVDLAGSERvnrTGAEGQRlKEGcNINqSL 

++ Q + + +++S ++L DLAGSER+ +T + EG RL+E +NIN SL 
Query 353 KILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSL 407 

HMM ttLGnVInaLaDgqTKYmYgghgHIPYRDSKLTWlLQDSLGGNcKTcMIA 
+TLG++IN+L + + + +H+P+R+SKLT+ +Q + G +K CMI+ 
Query 408 LTLGKCINVLKNSE KSKFQQHVPFRESKLTHYFQS FFNGKGKICMIV 454 

HMM CIWPadWNYEETLSTLRYAdRAKnlkNkPQINEDPca* 

+14- + Y+ETL++L++ + A+++ + ++N+++++ 
Query 455 NISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQDK 491 
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DKF2phtes3_35b5 



group: metabolism 

DKF2phtes3_35b5 encodes a novel 4 66 amino acid protein, with similarity to bovine accessory 
subunit for vacuolar ATPase and rat C7-1 protein. 

The vacuolar proton-ATPase (V-ATPase) translocates protons into intracellular organelles or 
across the plasma membrane of specialized cells. The catalytic domain consists of a hexamer of 
3 A subunits and 3 B subunits, plus accessory subunits C, D, and E. The rat homolog C7-1 seems 
to be enriched in aged adult rats in the frontal cortex. 

The novel protein can find application in modulating the v-ATPase activity in endocytic and 
secretory organelles. 



strong similarity to bovine vacuolar ATPase (EC 3.6.1.-) chain A 

complete cDNA, complete cds potential start at Bp 8 , EST hits 
matches perfect to 154197 hypothetical protein, but posess 186 aa 
additional at N-terminus 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2043 bp 

Poly A stretch at pos. 2033, polyadenylation signal at pos. 2012 



1 GGCGGCCATG GCGACGGCTC GAGTGCGGAT GGGGCCGCGG TGCGCCCAGG 
51 CGCTCTGGCG CATGCCGTGG CTGCCGGTGT TTTTGTCGTT GGCGGCGGCG 
101 GCGGCGGCGG CAGCGGCGGA GCAGCAGGTC CCGCTGGTGC TGTGGTCGAG 
151 TGACCGGGAC TTGTGGGCTC CTGCGGCCGA CACTCATGAA GGCCACATCA 
201 CCAGCGACTT GCAGCTCTCT ACCTACTTAG ATCCCGCCCT GGAGCTGGGT 
251 CCCAGGAATG TGCTGCTGTT CCTGCAGGAC AAGCTGAGCA TTGAGGATTT 
301 CACAGCATAT GGCGGTGTGT TTGGAAACAA GCAGGACAGC GCCTTTTCTA 
351 ACCTAGAGAA TGCCCTGGAC CTGGCCCCCT CCTCACTGGT GCTTCCTGCC 
401 GTCGACTGGT ATGCAGTCAG CACTCTGACC ACTTACCTGC AGGAGAAGCT 
451 CGGGGCCAGC CCCTTGCATG TGGACCTGGC CACCCTGCGG GAGCTGAAGC 
501 TCAATGCCAG CCTCCCTGCT CTGCTGCTCA TTCGCCTGCC CTACACAGCC 
551 AGCTCTGGTC TGATGGCACC CAGGGAAGTC CTCACAGGCA ACGATGAGGT 
601 CATCGGGCAG GTCCTGAGCA CACTCAAGTC CGAAGATGTC CCATACACAG 
651 CGGCCCTCAC AGCGGTCCGC CCTTCCAGGG TGGCCCGTGA TGTAGCCGTG 
701 GTGGCCGGAG GGCTAGGTCG CCAGCTGCTA CAAAAACAGC CAGTATCACC 
7 51 TGTGATCCAT CCTCCTGTGA GTTACAATGA CACCGCTCCC CGGATCCTGT 
801 TCTGGGCCCA AAACTTCTCT GTGGCGTACA AGGACCAGTG GGAGGACCTG 
851 ACTCCCCTCA CCTTTGGGGT GCAGGAACTC AACCTGACTG GCTCCTTCTG 
901 GAATGACTCC TTTGCCAGGC TCTCACTGAC CTATGAACGA CTCTTTGGTA 
951 CCACAGTGAC ATTCAAGTTC ATTCTGGCCA ACCGCCTCTA CCCAGTGTCT 
1001 GCCCGGCACT GGTTTACCAT GGAGCGCCTC GAAGTCCACA GCAATGGCTC 
1051 CGTCGCCTAC TTCAATGCTT CCCAGGTCAC AGGGCCCAGC ATCTACTCCT 
1101 TCCACTGCGA GTATGTCAGC AGCCTGAGCA AGAAGGGTAG TCTCCTCGTG 
1151 GCCCGCACGC AGCCCTCTCC CTGGCAGATG ATGCTTCAGG ACTTCCAGAT 
1201 CCAGGCTTTC AACGTAATGG GGGAGCAGTT CTCCTACGCC AGCGACTGTG 
1251 CCAGCTTCTT CTCCCCCGGC ATCTGGATGG GGCTGCTCAC CTCCCTGTTC 
1301 ATGCTCTTCA TCTTCACCTA TGGCCTGCAC ATGATCCTCA GCCTCAAGAC 
1351 CATGGATCGC TTTGATGACC ACAAGGGCCC CACTATTTCT TTGACCCAGA 
1401 TTGTGTGACC CTGTGCCAGT GGGGGGGTTG AGGGTGGGAC GGTGTCCGTG 
1451 TTGTTGCTTT CCCACCCTGC AGCGCACTGG ACTGAAGAGC TTCCCTCTTC 
1501 CTACTGCAGC ATGAACTGCA AGCTCCCCTC AGCCCATCTT GCTCCCTCTT 
1551 CAGCCCGCTG AGGAGCTTTC TTGGGCTGCC CCCATCTCTC CCAACAAGGT 
1601 GTACATATTC TGCGTAGATG CTAGACCAAC CAGCTTCCCA GGGTTCGTCG 
1651 CTGTGAGGCG TAAGGGACAT GAATTCTAGG GTCTCCTTTC TCCTTATTTA 
1701 TTCTTGTGGC TACATCATCC CTGGCTGTGG ATAGTGCTTT TGTGTAGCAA 
1751 ATGCTCCCTC CTTAAGGTTA TAGGGCTCCC TGAGTTTGGG AGTGTGGAAG 
1801 TACTACTTAA CTGTCTGTCC TGCTTGGCTG CCGTTATCGT TTTCTGGTGA 
1851 TGTTGTGCTA ACAATAAGAA GTACACGGGT TTATTTCTGT GGCCTGAGAA 
1901 GGAAGGGACC TCCACGACAG GTGGGCTGGG TGCGATCGCC GGCTGTTTGG 
1951 CATGTTCCCA CCGGGAGTGC CGGGCAGGAG CATGGGGTGC TTGGTTGTTT 
2001 CCTTCCTAAT AAAATAAACG CGGGTCGCCA TGCAAAAAAA AAA 



BLAST Results 



No BLAST result 
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Medline entries 



95014142: 

A novel accessory subunit for vacuolar H(+)-ATPase from chromaffin 
granules . 

97215246: 

Identification of a rat brain gene associated with aging by 
PCR differential display method. 



Peptide information for frame 2 

ORF from 8 bp to 1405 bp; peptide length: 466 
Category: strong similarity to known protein 



1 MATARVRMGP RCAQALWRMP WLPVFLSLAA AAAAAAAEQQ VPLVLWSSDR 
51 DLWAPAADTH EGHITSDLQL STYLDPALEL GPRNVLLFLQ DKLSIEDFTA 
101 YGGVFGNKQD SAFSNLENAL DLAPSSLVLP AVDWYAVSTL TTYLQEKLGA 
151 SPLHVDLATL RELKLNASLP ALLLIRLPYT ASSGLMAPRE VLTGNDEVIG 
201 QVLSTLKSED VPYTAALTAV RPSRVARDVA VVAGGLGRQL LQKQPVSPVI 
251 HPPVSYNDTA PRILFWAQNF SVAYKDQWED LTPLTFGVQE LNLTGSFWND 
301 SFAPJL.SLTYE RLFGTTVTFK FILANRLYPV SARHWFTMER LEVHSNGSVA 
351 YFNASQVTGP SIYSFHCEYV SSLSKKGSLL * VARTQPSPWQ MMLQOFQIQA 
401 FNVMGEQFSY ASDCASFFSP GIWMGLLTSL FMLFIFTYGL HMILSLKTMD 
451 RFDDHKGPTI SLTQIV 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b5, frame 2 

TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus 
norvegicus C7-1 protein (C7-1) mRNA, complete cds., N = 1, Score = 
2088, P = 3.8e-216 

PIR:A55116 vacuolar ATPase (EC 3.6.1.-) chain Ac45 - bovine, N = 1, 
Score = 2011, P - 5.5e-208 

PIR:I54197 hypothetical protein - human, N « 1, Score « 1464, P « 
5.1e-150 

>TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus 
C7-1 protein (C7-1) mRNA, complete cds. 
Length » 4 63 

HSPs: 

Score * 2088 (313.3 bits), Expect * 3.8e-216, P - 3.8e-216 
Identities = 408/463 (88%), Positives - 426/463 (92%) 

ARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTHEGH 63 
+R+R G R A LW + LSL A AAA AAEQQVPLVLWSSDRDLWAP ADTHEGH 

SRI RTGTRWAPVLW LLLSLVAVAAAVAAEQQVPLVLWSSDRDLWAPV ADTHEGH 61 

ITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 12] 
ITSD+QLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 
ITSDMQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 123 

PSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYTASS 18: 
PSSLVLPAVDWYA+STLTTYLQEKLGASPLHVDLATL+ELKLNASLPALLLIRLPYTASS 
PSSLVLPAVDWYAISTLTTYLQEKLGASPLHVDLATLKELKLNASLPALLLIRLPYTASS 181 

GLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAWAGGLGRQLLQK 24: 
GLMAPREVLTGNDEVIGQVLSTL+SEDVPYTAALTAVRPSRVARDVA+VAGGLGRQLLQ 
GLMAPREVLTGNDEVIGQVLSTLESEDVPYTAALTAVRPSRVARDVAMVAGGLGRQLLQT 2 4 ] 

QPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWNDSFA 30: 
Q SP IHPPVSYNDTAPRILFWAQNFSVAYKD+W+OLT LTFGV+ LNLTGSFWNDSFA 
QVASPAIHPPVSYNDTAPRILFWAQNFSVAYKDEWKDLTSLTFGVENLNLTGSFWNDSFA 301 

RLSLTYERLFGTTVTFKFIIJVNRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGPSIY 36! 
LSLTYE LFG TVTFKFILA+R YPVSAR+WFTMERLE+HSNGSVA+FN SQVTGPSIY 



Query: 


4 


Sbjct: 


8 


Query: 


64 


Sbjct: 


62 


Query: 


124 


Sbjct: 


122 


Query: 


184 


Sbjct: 


182 


Query: 


244 


Sbjct: 


242 


Query: 


304 



831 



WO 01/12659 



PCT/IBOO/01496 



Sbjct: 302 MLSLTYEPLFGATVTFKFILASRFYPVSARYWFTMERLEIHSNGSVAHFNVSQVTGPSIY 361 

Query 364 SFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSPGIW 423 

SFHCEYVSSLSKKGSLLV PS WQM L +FQIQAFNV GEQFSYASDCA FFSPGIW 
Sbjct: 362 SFHCEYVSSLSKKGSLLVTNV-PSLWQMTLHNFQIQAFNVTGEQFSYASDCAGFFSPGIW 420 

Query 424 MGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 466 

MGLLT+LFMLFI FTYGLHMILSLKTMDRFDD KGPTI+LTQIV 
Sbjct: 421 MGLLTTLFMLFIFTYGLHMILSLKTMDRFDDRKGPTITLTQIV 463 

Pedant information for DKFZphtes3_35b5, frame 2 
Report for DKFZphtes3_35b5 .2 

[LENGTH] 4 66 

[MW] 51621.44 

[HOMOLJ TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus C7-1 

protein (C7-1) mRNA, complete cds . 0.0 

[PIRKW] hydrolase 0.0 

[PROSITE) MYRISTYL 7 

[PROSITE] CAMP PHOSPHO SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASM GLYCOSYLATION 7 

[KW] SIGNAL_PEPTIDE 38 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 11.59 % 

SEQ MAT AR VRMG P RC AQALWRM P WL P V FLS LAAAAAAAAAEQQV P L VLWS S D R DLW A P AADT H 

S£G xxxxxxxxx 

PRD ccceeeecccchhhhhhhcccchhhhhhhhhhhhhhhhhcccaeeecccccccccccccc 

MEM 

SEQ EGHITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENAL 

SEG 

ccccccchhhhhccccccccccccceeecccccccccccccccccccccchhhhhhhhcc 



PRD 
MEM 



.xxxxxxxxxxxxxxxxxxxx . . 
cccccceeeeeecccccchhhhhhhccccccchhhhhhhccccceeehhhhhccccchhh 



SEQ DLAPSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYT 

SEG xxxxxxxxxxxxxxx . . . 

PRD ccccccccccccceeeeehhhhhhhhhhccccchhhhhhhhhhhhhhcchhhhhhhcccc 

MEM 

SEQ' ASSGLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAWAGGLGRQL 
SEG 
PRD 

MEM 

SEQ LQKQPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWND 

SEG 

PRD hhhhccccccccccccccccceeeeeccccceeeeccccccccceeeeeecccccccccc 

MEM 

SEQ SFARLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGP 

SEG 

PRD hhhhhhhhhhhhccceeeeeeecccccccccchhhhhhhhhhcccccceeeeeecccccc 

MEM 

SEQ SIYSFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSP 

SEG xxxxxxxxxx 

PRD ceeeeeeeeeeecccccceeeeeccccchhhhhhhhheeeeccccccccccccccccccc 

MEM » MMMMMM 

SEQ GIWMGLLTSLFMLFI FTYGLHMILSLKTMDRFDDKKGPTISLTQI V 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphtes3_35b5 . 2 

PS00001 166->170 ASN GLYCOSYLATION PDOC00001 

PS00001 257->261 ASNJ3LYCOSYLATION PDOC00001 
PS00001 269->273 ASN_GLYCOSYLATION PDOC00001 
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PSOUUul 




PSOOOOl 




PSOOOOl 


i a c — 


T\ o A f\ f\ 1 

PSOUUOl 


J3J*>JD / 


PS00004 


j /;>-> j /y 


PS00005 


3->© 


PS00005 




PS00005 


159->162 


PS00005 


one \ino 


PS00005 


Jlo-> J2 1 


PS00005 


33 l->334 


n ^ r\ C 

PSQQQ05 


J I — > J / / 


PS00005 


q 4 D — «J O 


PS00006 


A B_vC1 


PS00006 


72-> / D 


PS00006 


94->98 


r\ r+ f\ /\ f\ n £ 

PS00006 


114->118 


PS00006 








PS00006 


255->259 


PS00007 


207->214 


PS00008 


102->108 


PS00008 


103->109 


PS00008 


200->206 


PS00008 


295->301 


PS00008 


314->320 


PS00008 


421->427 


PS00008 


425->431 



asn_glycos ylat i on 
asn_glycosylation 
asn glycosylation 
asn~glycosylation 
camp_phos pho_s ite 
prc phosphors ite 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH03siTE 

CK2 PHOSPHORS ITE 

CK2~PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0 SITE 

CK2 PHOSPHORITE 

CK2~PHOSPHO~SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC0000B 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_35b5 . 2) 
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DKFZphtes3_35e21 



group: differentiation/development 

DKFZphtes3 35e21.2 encodes a novel 104 amino acid putative interleukin precursor, related to 
interleukin-7 . 

Due to the close relationship to human interleukin-7, the novel interleukin is expected to act 
as a new growth factor for human B lineage cells. Additionally, the protein should induce the 
gene rearrangement of the T-cell receptor repertoire, leading to thymocyte commitment, and 
subsequently induce both cytotoxic T-cell- and lymphocyte-activated killer cells. 

This new interleukin could find clinical application in a variety of conditions of 
hematolymphopoietic failure and different tumours, because of its recruitment of B cell 
lineage cells, cytotoxic T-cell- and lymphocyte -activated killer cells. 



similarity to interleukin-7 precursor 

complete cdna, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2095 bp 

Poly A stretch at pos . 2085, polyadenylation signal at pos. 2067 



1 GGATGAAAGT GATTTAATTC ATTTTTAGAA TTTTTTTTTT GTTTTGTTTT 
51 AGCAACATGC TGAACAACTA ATTTACTTTA AAAATAAGCC AGTTAAAACA 
101 AAGGACGCTA AGCCCAAGTG GGGGGCAATA TTAGTCAGGA TCTTTGGGGT 
151 CTAATTCCAG ACCAACTTTC AGAAGCACTT CTTTGTCTCT GTTCTCACCT 
201 CTGCTGTCCC TCTCTTCCCT CATCCCCTAA GAGAGACAAA GATAAAAGCC 
251 CACCTGCATC CCTAAGTCTT ACTGAGATCA GCCACCCCAG GGGAGAGAAA 
301 CTGGATCTAC TTACAGCCAC CCCCTGTTTC CATCCATATA CTTACTTCCC 
351 CCAATTTGCA TGTGATTATG GAAACAAGTC ATGCTCATGA AAGCAACTGT 
401 AAAATAAAAG GTTATGGAGT AGTTCAGCAA CTTCTTCACA GCCAGCTTTG 
4 51 TGGAGCTGGG GAGGACTTAG GGCCCATTGG AGTCTCTTAT GTGTACAGCT 
501 TCAGGGCTGT CCCTTTCAGT TTGATTTTAA GCAATGCCTC ACTTCATAGC 
551 TTAGGGGGTA AGGATTCCAT TCAGGTAGGT TGTCTAAAGG AACTAATGGG 
601 ACCTCTCAGT GAATTAGCTG ACCAGATTTT AGGAAATCTT TTTAATTTCT 
651 ATGATTTTCC TTCTCACATT TTGAAATGGT AAAATTGACT GGAAATAATT 
701 TTTCTTGGTG CCTTATTGGT TTTCCTTGCA AACCTTTCTC ATATTTTCTC 
751 ATGACCATTG CCAGTGACCA AGGCCCATGT GTGTGTTGTG TGTAATTGTG 
801 GGCATGTACA AGCTTAAATA ACGTGCCGAC AGCACTGTTT CAAAGTTGGT 
851 ATTCATTAGG CTGTTGCCTC CTGGGCTGGA GCTGCGCTAA TCCTGACACC 
901 GGCTGCCAGG AGAAAACCTC ATGGATCACA CACCAAACCT TAATAACAGC 
951 ATCCGTGACC TGCACTCTCC AGTACAGAAT GGGAACCCCA GAGCTAGGAA 
1001 ATGTAGTTGT ATATTTTAAT GAACTGCTAC CCCAGCCAAA GAAGCTTCTT 
1051 TCACTTTTGT GCTCTACAGA AAGCCCAAGG GGGGTAGGAG GGACAGAGCT 
1101 TTGAATAACT GCTTTCTAAC ACTAAATGTG GCCAACAGGA CAGAGCACAT 
1151 CACACGTATA GGCAGGTGTG AGGGACAGTG GCTAAGAATT GCCTGCTCCC 
1201 TCTGCATGCT CTTTCTTGTT TCCAAAGTCC AATCAAGTGA TCCTGGGAAA 
1251 CAAATCTGTC TGGATTGCGG AGGGTGGTTC TGAAAGAACT GCCAAGACGT 
1301 TAAAGAAGGG TGAAGAGTAG GCAGAATATA AGTAGCTAAC CTGAGTCAAG 
1351 ACTCTCAAAA GCTAGCAGCC TGATGACAAT AGGATTTATT TCAGCCAGGA 
1401 TAGTGTCTGT CTGTGAGTGC ATCATTTTAA GACAGTATGA CTTCATGTTG 
14 51 TTACAAACTA TGTATAGTAT GTATGTTTTG TGGGTTGTAT ATATACATAA 
1501 TATATATTAT ATATATATAT GAG AG ATT TG GTGACTTTTG ATACGGGTTT 
1551 GGTGCAGGTG AATTTATTAC TGAGCCAAAT GAGGCACATA CCGAGTCAGT 
1601 AGTTGAAGTC CAGGGCATTC GATACTGTTT ATGATTTCCA TATATGTATA 
1651 GTGCCTATCC CATGCTGTAG TCACTGTTAT GTTAAATCCA GAAGTTACAC 
1701 TAGAGCCAGC GATACTTTAT TTGTAGACAA TCAATTTGAA TCCATATGTT 
1751 ATTACTGGCA GATGATACAT GATTACAGTT CTGAATCTGT AACACTTACA 
1801 AAAGGAAACC CAGAGCAGCT TGATGAGTTT TTGTTTCTGC TTCGTTCCTG 
1851 GGAGTCAGTA GAAACAGCAG TTGTATGTGG TTATGTTAGT CTCAAGATAC 
1901 TTAATTTGTT GACCTTACTT CAGAAAAATT TTGTATGTAT TATATTTGTG 
1951 GGAAGGTAAA ATAATCATTT GAGATTTTTA TCAAATATGA AGATTAGTTA 
2001 TTTATGAAAA ACAAAGAAAT GTCTATTTTT CTTTGTTCCC AATTAATGTA 
2051 GATAAATTTT AAAATGCATT AAAGTAATGG TCCGGAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



89098903: 

Human interleukin 7: molecular cloning and growth factor 
activity on human and murine B-lineage cells. 



Peptide information for frame 2 



ORF from 368 bp to 679 bp,- peptide length: 104 
Category: similarity to known protein 



1 METSHAHESN CKIKGYGWQ QLLHSQLCGA GEDLGPIGVS YVYSFRAVPF 
51 SLILSNASLH SLGGKDSIQV GCLKELMGPL SELADQILGN LFNFYDFPSH 
101 ILKW 

BLASTP hits 

Entry B32223 from database PIR: 
interleukin-7 precursor (clone 1) - human 

Score =66, P - 7.0e-01, identities » 21/70, positives - 33/70 



Alert BLASTP hits for DKFZphtes3_35e21, frame 2 

PIR:B32223 interleukin-7 precursor (clone 1) - human, N = 1, Score - 
66, P = 0.72 

TREMBL:PADAL1_1 gene: "dall"; P.abies dall mRNA, N - 2, Score - 59, P 
* 0.77 

PIR:C32223 interleukin-7 precursor (clone 4) - human, N = 1, Score * 
66, P = 0.79 

TREMBL:PRU7 6726_1 gene: "PrMADS3°; product: "MADS-box protein"; Pinus 
radiata MADS-box protein <PrMADS3) mRNA, complete cds., N - 2, Score = 
59, P - 0.94 

>PIR:B32223 interleukin-7 precursor (clone 1) - human 
Length » 133 

HSPs: 

Score - 66 (9.9 bits), Expect - 1.3e+00, P - 7.2e-01 
Identities » 21/68 (30%), Positives - 33/68 (48%) 

Query: 39 VSYVYSFRAVPFSLIL SNASLHSLGGK-- DSIQVGCLKELMGPLSELADQILGNL 91 

VS+ Y F P L+L S+ + GK +S+ + + +L+ + E+ L N 

Sbjct: 4 VSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNE 63 

Query: 92 FNFYDFPSHI 101 

FNF F HI 
Sbjct: 64 FNF--FKRHI 71 

Pedant information for DKFZphtes3_35e21, frame 2 

Report for DKFZphtes3_35e21 .2 

[LENGTH] 104 

[MW] 11339.12 

(pi) 5.87 

[PROSITEJ MYRISTYL 2 

[PROSITE1 pKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOS YLAT ION 1 

(KW) Alphabet a 

SEQ METSHAHESNCKIKGYGWQQLLHSQLCGAGEDLGPIGVSYVYSFRAVPFSLILSNASLH 
PRD ccchhhhhcccccccchhhhhhhhhhhcccccccccceeeeeeeccccceeeeecccccc 
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SEQ SLGGKDSIQVGCLKELMGPLSEIADQILGNLFNFYDFPSHILKW 
PRD cccccceeeccccccccccchhhhhhhhcccccccccccccccc 



Prosite for DKFZphtes3_35e2l .2 

PS00001 56->60 ASN GLYCOSYLATION PDOC00001 

PS00005 44->47 PKCf PHOSPHO SITE PDOC00005 

PS00008 63->69 MYRISTYL PDOC00008 

PS00008 89->95 MYRISTYL PDOC00008 



{No Pfam data available for DKFZphtes3_35e21 . 2> 
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DKFZphtes3_35g6 



group: testes derived 

DKFZphtes3_35g6 encodes a novel 482 amino acid protein with high partial similarity to H. 
sapiens chromosome 19, cosmid R27216. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



strong similarity to R27216_l 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map»"15 M 
Insert length: 3177 bp 

Poly A stretch at pos. 3167, polyadenylation signal at pos. 3148 



1 GGAGGCAGCG CCGGCCTCCG GAGGCGGCCT GGGCGATGGC GGCGGAGTTT 
51 TGTCCATAAC CTGGGCAACC GCGCAGCTGG AGGATGGCCT CACTCGGGCC 
101 TGCCGCAGCT GGGGAGCAGG CGTCGGGGGC TGAGGCGGAG CCGGGCCCCG 
151 CGGGGCCGCC GCCGCCGCCC TCACCGTCCT CTCTGGGGCC CCTGCTCCCC 
201 CTGCAGCGGG AACCTCTCTA CAACTGGCAG GCGACCAAGG CGTCGCTGAA 
251 GGAGCGCTTC GCCTTCCTCT TCAACTCGGA GCTGCTGAGC GATGTGCGCT 
301 TCGTACTGGG CAAGGGTCGC GGCGCCGCCG CCGCTGGGGG CCCGCAGCGC 
351 ATCCCCGCCC ACCGCTTCGT GCTGGCGGCC GGCAGCGCCG TCTTTGACGC 
401 CATGTTCAAC GGCGGCATGG CCACCACGTC GGCCGAGATC GAGCTGCCGG 
451 ACGTGGAGCC CGCAGCCTTC CTGGCGCTGC TGAGATTTCT ATATTCAGAT 
501 GAAGTTCAAA TTGGTCCAGA AACAGTTATG ACCACTCTTT ATACTGCCAA 
551 GAAATACGCA GTCCCAGCCT TGGAAGCACA CTGTGTAGAA TTTCTCACCA 
601 AACATCTTAG GGCAGATAAT GCCTTTATGT TACTTACTCA GGCTCGATTA 
651 TTTGATGAAC CTCAGCTTGC TAGTCTTTGT CTAGATACAA TAGACAAAAG 
701 CACAATGGAT GCAATAAGTG CAGAAGGGTT TACTGATATT GATATAGATA 
751 CACTCTGTGC AGTTTTAGAG AGAGACACAC TCAGTATTCG AGAAAGTCGA 
801 CTTTTTGGAG CTGTTGTACG CTGGGCAGAA GCAGAATGTC AGAGACAACA 
851 ATTACCTGTG ACTTTTGGGA ATAAACAAAA AGTTCTAGGA AAAGCACTTT 
901 CCTTAATCCG GTTCCCACTG ATGACAATTG AGGAATTTGC AGCAGGTCCT 
951 GCTCAATCTG GAATTTTGTC AGATCGTGAA GTGGTAAACC TCTTTCTTCA 
1001 TTTTACTGTC AACCCTAAAC CCCGAGTTGA ATACATTGAC CGACCAAGAT 
1051 GCTGTCTCAG GGGAAAGGAA TGCTGCATCA ATAGATTCCA GCAAGTAGAA 
1101 AGCCGCTGGG GTTACAGTGG GACGAGTGAT CGAATCAGAT TCACAGTTAA 
1151 TAGAAGGATC TCTATAGTTG GATTTGGCTT GTATGGATCT ATTCATGGCC 
1201 CTACAGATTA TCAAGTGAAT ATACAGATCA TTGAATATGA GAAAAAGCAA 
1251 ACCCTGGGAC AGAATGATAC CGGCTTTAGT TGTGATGGGA CAGCTAACAC 
1301 ATTCAGGGTC ATGTTCAAGG AACCCATAGA GATCCTGCCC AATGTGTGCT 
1351 ACACAGCATG TGCAACACTC AAAGGTCCAG ATTCCCACTA TGGCACAAAA 
1401 GGATTGAAGA AAGTAGTGCA TGAGACACCT GCTGCAAGCA AGACTGTTTT 
1451 TTTCTTTTTT AGTTCCCCTG GCAATAATAA TGGCACTTCA ATAGAAGATG 
1501 GACAAATTCC AGAAATCATA TTTTATACAT AATTTAGCAT TATAATACAT 
1551 CTTGGCTAAA TAATACCATA CAATCTAGTG TCAAAAACAT AAATGGCCAC 
1601 AAAAAAGTAG TTTGAGTGTT ATGAATATTT AAAATTGTAA GATAAGAAAC 
1651 AGTTTCTTAG AGCAGATAGA AAAATGCTTA TTTAAATCTT TGCATGATTT 
1701 AAAAACAGAT TTTCCATTTT CTTACAACTT TAAGAGAAAA GAACTGGGTT 
1751 TAATGGTTTA AAAAAAAGCA CAGCTTTTTC ACCTTCATCT TGTATAATTT 
1801 CATAGATTGG CTGACTTAGG GTCTTTCAAT AGTTTGGGAA TTGAAAGATT 
1851 CTTGTTATAT ATAGCTAGTT TGGGTTTGTT TTTGTTTTAA CTATTTTGAA 
1901 GGTTAGGTGA GATGGGCAAA TAGGCTTAAC TATTTTGAAG GTTGGATGAA 
1951 AAGAGATGGG TCAGTATTCC TACAGAATTC TTATTAACTC AAATAACTAA 
2001 ATTTCAGAAA ATTAAGAAGC TGACTTTATA TTTGGTGGTT TGAAGTATCT 
2051 TGTTGTTAGC ATTTGTAATA ATGCTAAAAA AGGCCTAATA AAATGCCCAA 
2101 GAAAATATTC AGTGCATTTA TAGAGAAGGA TATTTTGTAG TAGTATAGTA 
2151 ATGTGTTATG TAGTACAGTT TTAAAGCTAT AAATGGAATT TTGTGTAAAT 
2201 TCACAAAAAT GTGATATAAA CAGGATCTAA GACTGGATTC CCTGTCACTA 
2251 AACTGCACCA CTATACCTGT CfCTCTGTGT GGGGGACACT GCTGATGATT 
2301 CCCAAGATTG AGATGATGAC GGTGATGACG ACTGGGTGAA CAGCCATCAC 
2351 TTCAACATTG TGATAATCCT TCACAGCAAG AAACCGAATA AAATACTAAC 
2401 ATTTCTAACA ACTGCTCTGA CATTGTAAAG AGATCCAACA GAATCACTCC 
2451 TGCTGAAAAA TACGCTTTCT GCCACCTACA CATTTCTATT TAGGAAGTAA 
2501 AATTTGCTTC ATGGTCATGA CCCCATTAGT CAGTGTTACA GCTGTGTTGG 
2551 GGATAGGAAG TATATCTGGC AGATTGACAT TTATACACTT TTTTATAAAG 
2601 CAGATTTTAA AATATAGTAA CATCCATTTT TTTCCCTTGA AAGTGATTCT 
2651 CTTATAAAAA ATGAAAGTGG AGTTTAAGGT ATATCAAATC GTTGTGGAAG 
2701 GTGATTAAAA ATCAAAATTC TTTTAAATAT CAACTTAATT TTTTCTAAGT 
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2751 AAGATACAAA AAATTTTCAT CTAAAGTAAT ATTTCACTTT ATATTGTAAA 
2801 GAAGGTAGGT ATATTGGTGG CTGAGGTCTC TTGAAATTGC TAAAGGGAAA 
2851 TTTTTCTATG GTAATGCTCT TACGGATATA AGCCTCAGTT AAATGGAATT 
2901 ATCTATGGGA TGTGTGGTTC TGGTTAACTA AAAATTAACC AGTAAACACT 
2951 CTGTAGTAAC CATTACAGAA AATACTTCTG CCTTAAAAAA TATGATATGC 
3001 CAGAGATGAG TTAGTGTTTC TTGACGTTGG AGACCTATAA ATGCCTCATC 
3051 TGTTGTACTG AACAATTGAA ACTGCATGCA GCCATAAAAG GGACAAGAAA 
3101 CAGAACTGTT TACTAACTTT GGGACATCCC CTGGAGTTTT TAAAAATAAA 
3151 TAAATATATA TATATATAAA AAAAAAA 



BLAST Results 



Entry G37753 from database EMBL : 
SHGC-63477 Human Homo sapiens STS genomic. 
Score » 1627, P * 3.0e-66, identities = 327/329 

Entry G37752 from database EMBL: 
SHGC-63476 Human Homo sapiens STS genomic. 
Score - 1578, P « 6.2e-64, identities - 320/324 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 84 bp to 1529 bp; peptide length: 482 
Category: similarity to unknown protein 



1 MASLGPAAAG 
51 TKASLKERFA 
101 SAVFDAMFNG 
151 TLYTAKKYAV 
201 DTIDKSTMDA 
251 ECQRQQLPVT 
301 VNLFLHFTVN 
351 IRFTVNRRIS 
401 DGTANTFRVM 
4 51 ASKTVFFFFS 

BLASTP hits 
Entry AC005306_2 from database TREMBL: 

product: W R27216_1 M ; Homo sapiens chromosome 19, cosmid R27216, 
complete sequence. 

Score =» 1298, P = 1.9e-132, identities ■ 245/297, positives = 268/297 

Entry CEF38H4_9 from database TREMBLNEW: 

gene: "F38H4.7"; Caenorhabditis elegans cosmid F38H4 

Score = 1237, P - 5.6e-126, identities - 248/446, positives - 322/446 

Entry AC004 678_1 from database TREMBL: 

product: "R34094_l" ; Homo sapiens chromosome 19, cosmid R34094, 
complete sequence. 

Score => 555, P « 1.0e-53, identities - 112/137, positives =» 123/137 



Alert BLASTP hits for DKFZphtes3_35g6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35g6, frame 3 



EQASGAEAEP GPAGPPPPPS 
FLFNSELLSD VRFVLGKGRG 
GMATTSAEIE LPDVEPAAFL 
PALEAHCVEF LTKHLRADNA 
ISAEGFTDID IDTLCAVLER 
FGNKQKVLGK ALSLIRFPLM 
PKPRVEYIDR PRCCLRGKEC 
IVGFGLYGSI HGPTDYQVNI 
FKEPIEILPN VCYTACATLK 
SPGNNNGTSI EDGQIPEIIF 



PSSLGPLLPL QREPLYNWQA 
AAAAGGPQRI PAHRFVLAAG 
ALLRFLYSDE VQIGPETVMT 
FMLLTQARLF DEPQLASLCL 
DTLSIRESRL FGAVVRWAEA 
TIEEFAAGPA QSGILSDREV 
CINRFQQVES RWGYSGTSDR 
QIIEYEKKQT LGQNDTGFSC 
GPDSHYGTKG LKKVVHETPA 
YT 



Report for DKFZphtes3_35g6. 3 



I LENGTH] 482 

[MW] 52771.47 

[pi] 5.79 
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[HOMOL] TREMBL:AC005306_2 product 

R27216, complete sequence, le-142 



"R27216_l"; Homo sapiens chromosome 19, cosmid 



( BLOCKS 1 
[SUPFAM] 
[SUPFAM) 
[SUPFAM] 
[SUPFAM] 
IPROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 
[KW] 



BL01075D Acetate and butyrate kinases family proteins 
P02 domain homology 3e-08 
A55R protein middle region homology 5e-06 
A55R protein 5e-06 

A55R protein carboxyl- terminal homology 5e-06 
MYRISTYL 6 

2 
9 
1 
7 
2 



CAMP PHOSPHO_SITE 
CK2_PHOSPHO SITE 
TYR_PHOSPHO~SITE 
PKC PHOSPHORS ITE 
ASN~GLYCOSYLATION 
Alpha_Beta 
LOW COMPLEXITY 



11.20 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MASLGPAAAGEQASGAEAEPGPAGPPPPPSPSSLGPLLPLQREPLYNWQATKASLKERFA 

. . . .xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

cccccccchhhhhhhhcccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

FLFNSELLSDVRFVLGKGRGAAAAGGPQRIPAHRFVLAAGSAVFDAMFNGGMATTSAEIE 

xxxxxxxxxxx 

hhhccccccceeeeecccccccccccccchhhhheeecccchhhhhhhhcchhhhhhhee 

LPDVEPAAFLALLRFLYSDEVQIGPETVMTTLYTAKKYAVPALEAHCVEFLTKHLRADNA 

ecccchhhnhhhhhhhhccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccch 

FMLLTQARLFDEPQLASLCLDTIDKSTMDAISAEGFTDIDIDTLCAVLERDTLSIRESRL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhh 

FGAVVRWAEAECQRQQLPVTFGNKQKVLGKALSLI RFPLMTI EEFAAGPAQSGILSDREV 

hhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhcceeecccccccccccccchhhhh 

VNLFLHFTVNPKPRVEYIDRPRCCLRGKECCINRFQQVESRWGYSGTSDRIRFTVNRRIS 

hhhhheeeccccceeeeecccceeeccceeehhhhhhhhhccccccccccchhhhhceee 

IVGFGLYGSIHGPTDYQVNIQIIEYEKKQTLGQNDTGFSCDGTANTFRVMFKEPIEILPN 

eeeccccccccccchhhhhhhcchhhhhhhhccccccccccccccceeeeeccceeeccc 

VCYTACATLKGPDSHYGTKGLKKWHETPAASKTVFFFFSSPGNNNGTSIEDGQIPEIIF 

xxxxxx 

ccceeeeecccccccccccceeeeeeeccccceeeeeeeecccccccccccccccceeec 

YT 
CC 



Prosite for DKFZphtes3_35g6.3 



PS00001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 



394->398 
466->470 
357->361 
387->391 

54->S7 
154->157 
234->237 
296->299 
348->351 
406->409 
428->431 

14->18 

54->58 
115->119 
206->210 
217->221 
234->238 
281->285 
296->300 
468->472 
430->437 

80->86 
110->116 
365->371 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP_PHOSPHO SITE 
CAMP PHOSPHO~SITE 
PKC PHOSPHORS I TE 
PKC~PHOSPHO_SITE 
PKC'PHOSPHO SITE 
PKC~PHOSPHO~SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 
PKC~PHOSPHO~S ITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHORS ITE 
CK2~PHOSPHO SITE 
CK2 PHOSPHO~SITE 
CK2~PHOSPHO~SITE 
CK2 PHOSPHO SITE 
CK2~PHOSPHO~SITE 
CK2~PHOSPHO"SITE 
CK2~PHOSPHO~SITE 
TYR_PHOSPHO~SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 392->398 MYRISTYL PDOC00008 

PS00008 402->408 MYRISTYL PDOC00008 

PS00008 463->469 MYRISTYL PDOC00008 



<No Pfam data available for DKFZphtes3_35g6. 3) 
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DKFZphtes3_35kl6 



group: metabolism 

DKFZphtes3_35kl6 encodes a novel 666 amino acid protein with weak similarity to fatty acid-CoA 
synthetaseses/ligases . 

The novel protein contains a putative AMP-binding domain signature, which is present in 
enzymes, which act via an ATP-dependent covalent binding of AMP to their substrate. This 
domain is found in several CoA synthetases, such as acetate-CoA ligase (EC 6.2.1.1), long- 
chain- fatty-acid-CoA ligase (EC 6.2.1.3), 

bile acid-CoA ligase. Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate. 

The new protein can find application in modulation of fatty acid metabolism and as a new 
enzyme for biotechnologic production processes. 



similarity to acyl-CoA synthetase 

complete cDNA, complete cds, potential start codon at Bp 50, 
few EST hits, seems to be a testis specific cDNA, 
5 of 6 EST hits are from testis derieved librarys 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2520 bp 

Poiy A stretch at pos. 2510, polyadenylation signal at pos. 2490 



1 CAGATGTCCC AGCTCCAGTG CTGTGGAGCA TGGTTTCTGC ACACCTGGAA 
51 TGACTGGAAC CCCAAAGACT CAAGAAGGAG CTAAAGATCT TGAAGTAGAC 
101 ATGAATAAAA CAGAAGTTAC TCCCAGGCTG TGGACCACCT GTCGAGATGG 
151 AGAAGTCCTT CTGAGGCTAT CCAAACACGG ACCAGGCCAT GAGACCCCGA 
201 TGACCATCCC TGAATTTTTT CGAGAGTCAG TCAACCGATT TGGAACTTAT 
251 CCAGCCCTCG CATCCAAGAA TGGCAAAAAG TGGGAAATTC TGAATTTCAA 
301 CCAGTACTAT GAGGCTTGTC GGAAGGCTGC AAAATCCTTG ATCAAGCTGG 
351 GTTTGGAGCG TTTCCACGGA GTTGGTATCC TGGGGTTTAA CTCTGCAGAG 
401 TGGTTTATCA CTGCTGTTGG TGCCATCCTA GCCGGGGGTC TTTGTGTTGG 
451 TATTTATGCC ACCAACTCTG CCGAGGCTTG TCAATATGTC ATCACTCATG 
501 CCAAAGTGAA CATCTTGCTG GTTGAGAATG ATCAACAGTT ACAGAAAATC 
551 CTTTCGATTC CACAGAGCAG CCTAGAGCCC CTAAAAGCGA TCATCCAGTA 
601 CAGACTGCCA ATGAAGAAGA ACAACAACTT GTACTCTTGG GATGATTTCA 
651 TGGAACTTGG CAGAAGTATC CCTGACACCC AACTGGAGCA GGTCATCGAG 
701 AGCCAGAAGG CGAATCAATG CGCAGTGCTC ATCTACACTT CAGGGACCAC 
751 AGGCATACCC AAGGGAGTGA TGCTCAGTCA TGACAACATC ACGTGGATTG 
801 CAGGAGCAGT GACAAAGGAC TTTAAACTGA CAGACAAGCA TGAGACGGTG 
851 GTTAGCTACC TCCCACTCAG CCATATTGCA GCACAGATGA TGGACATCTG 
901 GGTACCCATA AAGATTGGGG CGCTCACATA CTTTGCTCAA GCAGATGCTC 
951 TCAAGGGCAC CTTGGTAAGT ACTCTAAAGG AGGTAAAACC TACTGTCTTC 
1001 ATTGGAGTGC CTCAAATTTG GGAGAAGATA CATGAGATGG TGAAGAAAAA 
1051 TAGTGCCAAG TCCATGGGCT TGAAGAAGAA GGCATTCGTG TGGGCAAGAA 
1101 ACATTGGCTT CAAGGTCAAC TCAAAAAAGA TGTTGGGGAA ATATAATACT 
1151 CCCGTGAGCT ACCGCATGGC TAAGACTCTC GTGTTCAGCA AAGTCAAGAC 
1201 ATCCCTTGGC TTGGATCACT GTCACTCTTT TATCAGTGGG ACTGCGCCCC 
1251 TCAACCAAGA GACTGCCGAG TTCTTTCTAA GCTTGGACAT ACCTATAGGC 
1301 GAGTTGTATG GGTTGAGTGA GAGCTCGGGA CCCCACACGA TATCCAACCA 
1351 GAATAACTAC AGGCTTCTAA GCTGTGGCAA GATCTTGACT GGGTGTAAGA 
1401 ATATGCTGTT CCAGCAGAAC AAGGATGGCA TTGGGGAGAT CTGCCTCTGG 
14 51 GGTAGGCACA TCTTCATGGG CTATCTGGAA AGTGAGACTG AAACTACAGA 
1501 GGCCATCGAT GATGAAGGCT GGCTACACTC TGGGGATCTG GGCCAGCTGG 
1551 ACGGTCTGGG TTTCCTCTAT GTCACCGGCC ACATCAAAGA AATCCTTATC 
1601 ACTGCTGGTG GTGAAAATGT GCCCCCCATT CCTGTTGAGA CCTTGGTTAA 
1651 GAAGAAGATC CCCATCATCA GTAACGCCAT GTTAGTAGGA GATAAACTGA 
1701 AGTTTCTGAG CATGTTGCTG ACGCTGAAGT GTGAGATGAA TCAGATGAGC 
1751 GGAGAACCTC TGGACAAGCT GAACTTCGAG GCCATCAACT TCTGTCGGGG 
1801 TCTGGGCAGC CAGGCATCCA CCGTGACTGA GATGGTGAAG CAGCAAGACC 
1851 CCCTGGTCTA CAAGGCCATC CAGCAAGGCA TCAATGCTGT GAACCAGGAA 
1901 GCCATGAACA ATGCACAGAG GATTGAAAAG TGGGTCATCT TGGAGAAGGA 
1951 CTTTTCCATC TATGGTGGAG AGCTAGGTCC AATGATGAAA CTTAAGAGAC 
2001 ATTTTGTAGC CCAGAAATAC AAAAAACAAA TTGATCACAT GTACCACTGA 
2051 CTGCTTTGAT GGAGCTGCTC TCAGCTGTTC TGATGCCTTC AGCAGGAAGA 
2101 CCTCATTGCA ATAAGTGAAA TGCTGCTCTA GGTAGAAGCT CTCCCTGCTG 
2151 TTTTTAAGAA GCCACATTCC TCATTGGTCA GTTTCTTGAT TGTTCGTCTG 
2201 TTGGAGAGGT GCTCCCTAGA AGAACCTGCC ATACGTTTCA AAGCAATAAA 
2251 ATCACTGTAT ATCTTTCTAA GGACCTTCAA GTCATGACTC CAGGGAAGCC 
2301 TATTGGGAAG TCTACTAAAA ACTGCCTGAT TTACAAGAAA GACCTGAACT 
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2351 TGTGGGCTCC CATTTGATTT TTTTCTCCTC AGGGGACTCA GACATTAGAA 
2401 AGAAAAAGCC TCACAGATTT GAAGAACTGG ACCCCCAAAT CAACTCACCT 
24 51 GCCTGGAAGC AACTGGGAAA CCCTTCCAAT AAGTCCTGAT AATAAAGCAC 
2501 TTCAGGGTCC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2047 bp; peptide length: 666 
Category: similarity to known protein 



1 MTGTPKTQEG AKDLEVDMNK TEVTPRLWTT 
51 MTIPEFFRES VNRFGTYPAL ASKNGKKWEI 
101 GLERFHGVGI LGFNSAEWFI TAVGAILAGG 
151 AKVNILLVEN DQQLQKILSI PQSSLEPLKA 
201 MELGRSIPDT QLEQVIESQK ANQCAVLIYT 
251 AGAVTKDFKL TDKHETWSY LPLSHIAAQM 
301 LKGTLVSTLK EVKPTVFIGV PQIWEKIHEM 
351 NIGFKVNSKK MLGKYNTPVS YRMAKTLVFS 
401 LNQETAEFFL SLDIPIGELY GLSESSGPHT 
451 NMLFQQNKDG IGEICLWGRH IFMGYLESET 
501 DGLGFLYVTG HIKEILITAG GENVPPIPVE 
551 KFLSMLLTLK CEMNQMSGEP LDKLNFEAIN 
601 PLVYKAIQQG. INAVNQEAMN NAQRIEKWVI 
651 HFVAQKYKKQ IDHMYH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35kl6, frame 2 

TREMBL : AB014531 1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds., H - 1, Score » 1641, P 
- 8.9e-169 

PIR:E70937 probable fadD15 - Mycobacterium tuberculosis (strain H37RV) , 
N = 2, Score - 532, P - 3.6e-62 

PIR:H64041 long-chain- fatty-acid— CoA ligase homolog - Haemophilus 
influenzae (strain Rd KW20) , N = 2, Score » 486, P - 6.5e-59 



>TREMBL: ABO 14531^1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds. 
Length = 634 

HSPs: 

Score - 1641 (246.2 bits), Expect « 8.9e-169, P - 8.9e-l69 
Identities * 319/628 (50%), Positives - 440/628 (70%) 

LRLSKHGPGHETPMTIPEFFRESVNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSL 97 
LR+ P + P T+ F E+++++G AL K KWE ++++QYY R+AAK 
LRIDPSCP — QLPYTVHRMFYEALDKYGDLIALGFKRQDKWEHISYSQYYLLARRAAKGF 59 

IKLGLERFHGVGILGFNSAEWFITAVGAILAGGLCVGIYATNSAEACQYVITHAKVNILL 157 

+KLGL++ H V ILGFNS EWF +AVG + AGG+ GIY T+S EACQY+ N+++ 
LKLGLKQAHSVAILGFNSPEWFFSAVGTVFAGGIVTGIYTTSSPEACQYIAYDCCANVIM 119 

VENDQQLQKILSIPQSSLEPLKAIIQYRLPM-KKNNNLYSWDDFMELGRSIPDTQLEQVI 216 

V+ +QL+KIL I L LKA++ Y+ P K N+Y+ ++FMELG +P+ L+ +1 
VDTQKQLEKILKI-WKQLPHLKAVVIYKEPPPNKMANVYTMEEFMELGNEVPEEALDAII 178 

ESQKANQCAVLIYTSGTTGIPKGVMLSHDNITWIA— GAVTKDFKLTD-KHETVVSYLPL 273 



CRDGEVLLRL 
LNFNQYYEAC 
LCVGIYATNS 
IIQYRLPMKK 
SGTTGIPKGV 
MDIWVPIKIG 
VKKNSAKSMG 
KVKTSLGLDH 
ISNQNNYRLL 
ETTEAIDDEG 
TLVKKKIPII 
FCRGLGSQAS 
LEKDFSIYGG 



SKHGPGHETP 
RKAAKSLIKL 
AEACQYVITH 
NNNLYSWDDF 
MLSHDNITWI 
ALT Y FAQ AD A 
LKKKAFVWAR 
CHSFISGTAP 
SCGKILTGCK 
WLHSGDLGQL 
SNAMLVGDKL 
TVTEMVKQQD 
ELGPMMKLKR 



Query: 


38 


Sbjct: 


2 


Query: 


98 


Sbjct: 


60 


Query: 


158 


Sbjct: 


120 


Query: 


217 
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Sbjct: 


179 


Query: 


274 


Sbjct: 


239 


Query: 


334 


Sbjct: 


299 


Query: 


394 


Sbjct: 


358 


Query: 


454 


Sbjct: 


418 


Qu e r y : 


J 1 4 


Sbjct: 


478 


Query: 


574 


Sbjct: 


538 


Query: 


634 


Sbjct: 


598 



++Q+ NQC VL+YTSGTTG PKGVMLS DNITW A G+ D + + + E WSYLPL 



SHIAAQ+ D+W 1+ GA FA+ DALKG+LV+TL+EV+PT +GVP++WEKI E +++ 



+A+S +++K +WA ++ + N 



P + R+A LV +KV+ +LG 



G AP+ ET FFL L+I + YGLSE+SGPH +S+ NYRL S GK++ GC+ L 



Q+ +GIGEICLWGR IFMGYL E +T EAID+EGWLH+GD G+LD GFLY+TG +K 



E++ITAGGENVPP+P+E VK ++ P 1 1 SNAML+GD+ KFLSMLLTLKC ++ + + D 



L +A+ FC+ +GS+A+TV+E+++++D VY+AI++GI VN A 



DFSI GGELGP MKLKR V +KYK ID Y 



I+KW ILE+ 



Pedant information for DKFZphtes3_35kl6, frame 2 



Report for DKFZphtes3_35kl6 . 2 



[LENGTH] 666 

(MW) 74344.97 

(pi) 8.67 

[ HOMOL } TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo sapiens 
raRNA for KIAA0631 protein, partial cds. le-176 

(FUNCAT] i lipid metabolism (H. influenzae, HI0002) 2e-55 

[ FUNCAT ] 08.10 peroxisomal transport IS. cerevisiae, YEROlSw] 2e-29 

[FUNCAT] 30.19 peroxisomal organization [S. cerevisiae, YER015w] 2e-29 

[FUNCAT] 01.06.13 lipid and fatty-acid transport (S. cerevisiae, YEROlSw] 2e-29 

(FUNCAT) 01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YER015w] 

2e-29 

( FUNCAT ] 01.06.01 lipid, fatty-acid and sterol biosynthesis (S. cerevisiae, . YMR2 4 6w) 
2e-23 , J 

(FUNCAT) 06.07 protein modification [glycolsylation, acylation, myristylation, 
palmitylation, farnesylation and processing) (S. cerevisiae, YMR246w) 2e-23 

[BLOCKS] BL00455 

(SCOP) dllci 5.19.1.1.1 Luciferase [Firefly (Phontinus pyralis) le-49 

(EC) 1.13.12.7 Photinus-luciferin 4-monooxygenase (ATP-hydrolysing) 9e-17 

[EC] 6.2.1.3 Long-chain-fatty-acid — CoA ligase 4e-34 

[EC] 5.1.1.11 Phenylalanine racemase (ATP-hydrolysing) 6e-08 

[EC] 6.2.1.12 4-Coumarate--CoA ligase 8e-18 

[PIRKW] duplication 6e-07 

[PIRKW] phosphopantetheine 3e-12 

(PIRKW) multifunctional enzyme 3e-06 

(PIRKW) ligase 6e-08 

(PIRKW) acid-thiol ligase 4e-34 

(PIRKW] transmembrane protein 5e-22 

[PIRKW] monooxygenase 9e-17 

[PIRKW] hydrolase 4e-34 

[PIRKW] peroxisome 9e-15 

[PIRKW] antibiotic biosynthesis 3e-12 

(PIRKW] isomerase 6e-08 

(PIRKW) flavonoid biosynthesis le-17 

[PIRKW] magnesium 9e-15 

[PIRKW] ATP 5e-22 

[PIRKW] oxidoreductase 9e-17 

(PIRKW] liver 2e-31 

[SUPFAM] alpha-aminoadipyl-cysteinyl-valine synthetase 3e-07 

(SUPFAM) human long-chain-fatty-acid--CoA ligase 4e-34 

[SUPFAM] gramicidin S synthetase I 6e-08 

[SUPFAM] peptide synthetase ppsE 7e-06 

[SUPFAM] gramicidin S synthetase I repeat homology 3e-12 

(SUPFAM) peptide synthetase ppsD 2e-07 
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[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 



[KW] 
[KW] 
[KW] 



probable acyl-CoA ligase medium chain 2e-09 

acetate — CoA ligase 8e-10 

acetate — CoA ligase homology 4e-54 

surfactin synthetase 3e-12 

4-coumarate — CoA ligase 8e-18 

short-chain alcohol dehydrogenase homology 8e-07 

acyl carrier protein homology 2e-29 

MYRISTYL 12 

AMP_BINDING 1 

AMIDATION 1 

CAMP PHOSPHO SITE 1 

CK2_PHOSPHO_SITE 9 

TYR PHOSPHORITE 3 

PKC~PHOSPHO_SITE 10 

ASNJ3LYCOSYLATION 2 

AMP-binding enzymes 

Irregular 

3D 

LOW COMPLEXITY 1.80 % 



SEQ MTGTPKTQEGAKDLEVDMNKTEVTPRLWTTCRDGEVLLRLSKHGPGHETPMTIPEFFRES 

SEG 

llci- 

SEQ VNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSLIKLGLERFHGVGILGFNSAEWFI 

SEG 

llci- 

SEQ TAVGAILAGGLCVGIYATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLKA 

SEG 

llci- 

SEQ . IIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIYTSGTTGIPKGV 

SEG ' 

llci- 

SEQ MLSHDNITWIAGAVTKDFKLTDKHETWSYLPLSHIAAQMMDIWVPIKIGALTYFAQADA 

SEG 

llci- 

SEQ LKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKBCNSAKSMGLKKKAFVWARNIGFKVNSKK 

SEG • 

llci- 

SEQ MLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFLSLDIPIGELY 

SEG 

llci- TTTTCEEETTTTCCCHHHHHHHHHHCCCCBCEE 

SEQ GLSESSGPHTISNQNNYRLLSCGKILTGCKNMLFQQNKDGIGEICLWGRHIFMGYLESET 



llci- ECGGGTTEEEECCCCCCEEEEETTTTEEEEETTTTTCEETTEEEEEETTTTCCEETTTHH 
SEQ ETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPII 



SEQ SNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDKLNFEAINFCRGLGSQASTVTEMVKQQD 

SEG 

llci- EEEEEEE 

SEQ P LV Y KA I QQG I N A VNQEAMNN AQRI EKWV ILEKDFSIYGGELGPMMKLKRH FV AQK Y KKQ 

SEG 

llci- 

SEQ IDHMYH 

SEG 

llci- 



SEG 



SEG 
llci- 



xxxxxxxxxxxx. . 

HHHHHBTTTTCEEEEEEEEETTTTCEEE 



•ECEEETTEEECHHHHHHHHHHT-TTE 



Prosite for DKFZphtes3_35kl6.2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



19->23 
246->250 
332->336 



218->221 
261->264 



4->7 
24->27 
30->33 



AS N_G LYCOS YLATION 
ASN_GL YCOS YLAT I ON 
CAMP_?HOSPHO_SITE 
PRC PHOSPHO_SITE 
PKC~PHOS PHO_S I TE 
PKC PHOSPHORS ITS 
PKC~PHOSPHO_SITE 
PKC'PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS0Q006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00455 



308->311 
335->338 
358->361 
370->373 
558->561 
30->34 
52->56 
173->177 
196->200 
206->210 
210->214 
308->312 
478->482 
591->59S 
659->666 
658->666 
597->605 
3->9 
65->71 
124->130 
130->136 
134->140 
235->241 
239->245 
303->309 
387->393 
421->427 
498->504 
586->592 
74->78 
227->239 



PKC_PHOSPHO 

PKC_PHOSPHO~" 

PKC_PHOSPHO~ 

PKC PHOSPHO* 

PKC^PHOSPHO" 

CK2_PHOSPHO" 

CK2_PHOSPHO" 

CK2 PHOSPHO* 

CK2~PHOSPHO* 

CK2_PHOSPHCf 

CK2_PHOSPHO" 

CK2_PHOSPHO 

CK2 PHOSPHO" 

CK22PHOSPHO 

TYR_PHOSPHO - 

TYR_PHOSPHO* 

TYR PHOSPHO" 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 

AMP BINDING 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
"SITE 

"site 
"site 
"site 
"site 
"site 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00427 



Pfam for DKF2phtes3_35kl6 . 2 



KMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



AMP-binding enzymes 

*TYRELNERANRLARHLRsekGIrPGDiVgIMMDRSMWMIVaMLGIWKAG 
+ + +E +A L+ +G VGI+ +S + ++ G ♦ AG 

82 NFNQYYEACRKAAKSLI-KLGLERFHGVGILGFNSAEWFITAVGAILAG 129 

GAYVPIDPeYPdERIqYMLEDSGArLLITQrh HmqRI PdemwwvdH 

G +V I +E QY++ ++ + +L+++ + + IP++++ + 

130 GLCVGIYATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLK 179 

IiviDWe WddlWWHedeeNpqpWvdPeDLAYIIY 

+ + + ++++ + E ++ ++++ A +IY 

180 A 1 1 QYRL PMKKNNNL YS W DD FME LGRSIPDTQLEQVIES QKAN QC A V L I Y 229 

TSGTTGKPKGVMIEHrNIvNycqWMnWRYgMteeDDRILWFtSDpYWFDa 
TSGTTG PKGVM++H NI+ + +++ +T+ + ++ + + ++ A 
230 T SGTTG I P KG VML S H DN I TW I AG AVT K D FKLTDKHETW S YLP-LSHIAA 278 

SVWDMFWpLLnGaTLYIpPeEtRrDPe rWWqYIqRHglTWWylTPSMFRM 
+++D++ P+ GA Y + ++ + ++++ ++T+ ++P +++ 

279 QMMDIWVPIKIGALTYFAQADAL--KGTLVSTLKEVKPTVFIGVPQIWEK 



326 



LMpd . 
+ + 



327 IHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKT 376 

psLRhVMFgGEpLsPehWdWWRkrfgf kgRIINMYWPT 

++ + +++G PL++E+++ ++ + ++I Y+ + 
377 LVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFL-SLD — IPIGELYGLS 423 

ETTVWtTwMrliPdepeqWrwiPIGRPIpNTqWYIMDdnMQIQPiGViGE 
E++ t+ + + R +++G+ + + + + +N G IGE 

424 E S SG PHTI SNQNN- - Y — - RLLSCGK I LTGCKNML FQQN KDG-IGE 463 

LYIgGWPGVARGYWNRPELTEERFipNPFWPGEYRrGWNrRMYRTGDLAR 
+++ G ++ GY+ + +T E+ + ++ ++GDL++ 
4 64 I C LWG - RH I FMG Y LES ETET TEA I D DEGW LHSGDLGQ 499 

WIPDGnlEYLGRID. DQVKIRGYRIELGEIEhqLr . qHPglqEAW* 
+ G+++ G I + G +++ + +E+ + ++P 1+ A 
500 LDGLG FLYVTGH I KEI L I TAGGENV P P I PVETL VKKKI P 1 1 SN AML 545 
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DKFZphtes3_35k24 



group: transmembrane protein 

DKFZphtes3 35k24 encodes a novel 514 amino acid protein without similarity to known proteins. 
The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown ; 

membrane regions : 5 

Summary DKFZphtes3_35k24 encodes a novel 514 amino acid protein. 
No homolouges found in bacteria yeast and C.elegans, specific for 
mammalians? 



unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2706 bp 

Poly A stretch at pos. 2696, polyadenylation signal at pos . 2675 



1 CCGTGTGCAG TCGCCCCGCG CCCCGCGCGA CCCTTCGGGT AAACTACGAA 
51 CTGGGAGTTC TGAAGAATGG GTAAAGACTT TCGTTACTAT TTCCAGCATC 
101 CCTGGTCTCG CATGATTGTG GCTTACTTGG TGATCTTCTT TAACTTCTTA 
151 ATATTTGCGG AGGACCCAGT TTCTCATAGC CAAACAGAAG CCAATGTTAT 
201 TGTTGTTGGA AACTGTTTTT CATTTGTTAC AAATAAATAC CCTAGAGGAG 
251 TTGGCTGGAG GATTTTGAAG GTGCTTCTAT GGCTACTTGC CATTCTCACA 
301 GGACTAATAG CTGGCAAATT TCTGTTCCAT CAGCGTTTGT TTGGTCAGTT 
351 GCTCCGATTA AAAATGTTTC GAGAAGATCA TGGGTCGTGG ATGACAATGT 
401 TCTTCAGCAC AATTCTCTTT CTCTTCATAT TTTCTCACAT ATACAACACG 
451 ATTCTTCTAA TGGATGGGAA CATGGGAGCA TATATCATTA CAGACTATAT 
501 GGGCATCCGA AATGAAAGTT TCATGAAATT AGCTGCAGTA GGGACCTGGA 
551 TGGGGGACTT TGTCACAGCT TGGATGGTCA CTGATATGAT GCTTCAGGAC 
601 AAACCCTATC CTGACTGGGG AAAATCAGCA AGAGCTTTCT GGAAGAAAGG 
651 AAATGTTAGG ATCACTTTAT TCTGGACAGT TCTTTTTACT CTGACGTCTG 
701 TGGTTGTACT TGTGATTACA ACGGACTGGA TCAGCTGGGA CAAGCTGAAT 
751 CGGGGATTTT TGCCCAGTGA TGAAGTTTCC AGAGCATTCC TTGCTTCTTT 
801 TATCTTGGTC TTTGACCTTC TTATTGTGAT GCAGGACTGG GAATTCCCAC 
851 ATTTCATGGG AGATGTTGAT GTAAATCTCC CTGGTTTGCA CACCCCTCAC 
901 ATGCAGTTCA AGATTCCTTT CTTCCAGAAA ATCTTCAAGG AGGAATATCG 
951 TATTCACATA ACAGGCAAAT GGTTTAACTA TGGAATTATC TTCCTCGTCT 
1001 TGATTTTGGA TCTTAATATG TGGAAGAACC AAATATTTTA TAAACCTCAT 
1051 GAATATGGGC AATATATCGG CCCGGGGCAG AAGATATATA CAGTGAAAGA 
1101 CTCAGAAAGT TTAAAAGATT TGAACAGAAC CAAGCTATCC TGGGAATGGA 
1151 GGTCCAATCA CACTAACCCT CGGACTAATA AAACATATGT TGAGGGAGAC 
1201 ATGTTCTTAC ACAGCAGGTT CATAGGAGCC AGTCTTGATG TCAAGTGTCT 
1251 GGCCTTTGTT CCAAGCCTGA TAGCCTTTGT GTGGTTTGGA TTCTTTATTT 
1301 GGTTCTTTGG ACGATTTTTG AAAAATGAGC CACGCATGGA GAATCAAGAC 
1351 AAAACTTACA CTCGCATGAA AAGAAAATCT CCATCAGAAC ATAGCAAAGA 
1401 CATGGGAATC ACTCGAGAAA ACACCCAGGC TTCAGTAGAA GACCCCTTGA 
1451 ATGACCCTTC TTTGGTTTGC ATCAGGTCTG ACTTCAATGA GATCGTCTAC 
1501 AAGTCTTCCC ACCTAACCTC GGAAAACTTG AGCTCACAGT TGAACGAATC 
1551 TACTAGTGCA ACAGAAGCTG ATCAAGACCC AACGACTTCT AAAAGTACAC 
1601 CTACGAACTA GACTCGGAGA TAGACTTGGA GATAACACAA AAAGCAACCT 
1651 TGAGTGTAAC TTTAAAAATT TAGTCTTTCC TTTTGTATAT GTAAGGTTTA 
1701 CGTAGTGTTA GGTAAAAATA TGAACAATGC CACAACGGTG CTCAACATGC 
1751 TTTTTCTAGG ATTCATTGTT TTCTATTTGT ATTATAATAC ACGTGCCTAC 
1801 TGTATACTCA ACAGTCCTCT AGAGATTGCT TTTCACAATT GCACAAGCTA 
1851 TTACTGACTT TACAGCATAG TGGAAGATTA GCTGATGACC CATGTATCTG 
1901 ATGTTCAACC ATAGTGGTGC CTTGAGACAT TAAACTGTTT TTAACTGTAC 
1951 CAGAAATGAA GTGTGGAACA GTTACCTAAC CTATTTCACA TGGGCGTTTT 
2001 GTATACAACT ATTTTGATCT ACACTTGATG TCTGAGCAGA AAACAGAAAT 
2051 AGCTAAATGT GACTCAGGAA GTATCTCTTG GTTTCTTATT CAGCAGCAGA 
2101 GTTGGTGACT TTGACAACTG GACTGCAGAG AAACATGGTG ATCACCTTTT 
2151 AATTTTTATT GGCTGTCTGC CAAATATAAA TACAGATGCA AAATTCAGTA 
2201 ATAGGAGATC CATAACCCAA CATGGGTCAC TACTCGTGAA ATGTGACTTT 
2251 CTCCCACCAG TAATTGAAAT GAGGTGATGA TACCTAATTA TGTTTTCCTA 
2301 ATTAAAGATA AATTGCTACT TGATTAAAAA TCCTGCCCTT CACCTTTGGG 
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2351 AACAAAGGTT AAGAGACACA GTTGGGCGAA CTCTCAAATT TATTGGCATT 

2401 TACACAAAGT CCCAGACAAC CAAGGAACTG AAGTTTTCAT CATATGAGAG 

2451 CAGCACATCC CACCATTTAC AATATTCGTA TATCTTTCTG CAAATATGGC 

2501 TCTGGATAGT GAAAATTGAA AAACATATGC CAACCCTGAG CAAGGGAACT 

2551 CCTCAAAAAA TCATGCAGCG GAACCTTGTC AGGTAGAGAA GCCGTGCATG 

2601 AAAGAATTTG TTTAATGTCT TGTTTTGCGT ATGTGTTTTT TGTTTTTGTT 

2651 TTTTAAGAAC TAAATATTGC ACATTAATAA ATAAGAATTA TACAGCAAAA 

2701 AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 67 bp to 1608 bp; peptide length: 514 
Category: putative protein 



1 MGKDFRYYFQ HPWSRMIVAY LVIFFNFLIF AEDPVSHSQT EANVIWGNC 
51 FSFVTNKYPR GVGWRILKVL LWLLAILTGL IAGKFLFHQR LFGQLLRLKM 
101 FREDHGSWMT MFFSTILFLF IFSHIYNTIL LMDGNMGAYI ITDYMGIRNE 
151 SFMKLAAVGT WMGDFVTAWM VTDMMLQDKP YPDWGKSARA FWKKGNVRIT 
201 LFWTVLFTLT SVVVLVITTD WISWDKLNRG FLPSDEVSRA FLASFILVFD 
251 LLIVMQDWEF PHFMGDVDVN LPGLHTPHMQ FKIPFFQKIF KEEYRIHITG 
301 KWFNYGIIFL VLILDLNMWK NQIFYKPHEY GQYIGPGQKI YTVKDSESLK 
351 DLNRTKLSWE WRSNHTNPRT NKTYVEGDMF LHSRFIGASL DVKCLAFVPS 
401 LIAFVWFGFF IWFFGRFLKN EPRMENQDKT YTRMKRKSPS EHSKDMGITR 
451 ENTQASVEDP LNDPSLVCIR SDFNEIVYKS SHLTSENLSS QLN EST SATE 
501 ADQDPTTSKS TPTN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35k24, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35k24, frame 1 



Report for DKFZphtes3_35k24 . 1 



[LENGTH} 514 

[MW) 60185.03 

[pi] 8.67 

[PROSITE) MYRISTYL 5 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE} CK2_PHOSPHO_SITE 8 

[PROSITE} TYR PHOSPHO SITE 1 

[PROSITE) PKC~PHOSPHO~SITE 7 

[PROSITE) ASN~GLYCOSYLATION 6 

(KW) SIGNAL PEPTIDE 32 

[KW] TRANSMEMBRANE 5 

[KW) LOW_COMPLEXITY 15.37 % 



SEQ MGKDFRYYFQHPWSRMIVAYLVI FFNFLI FAEDPVSHSQTEANVI VVGNCFSFVTNKYPR 

SEG 

PRD cccceeeeeecccchhhhhhhhhhhhhhhhccccccccccceeeeeecccceeeeccccc 

MEM 

SEQ GVGWRI LKVLLWLLAILTGLIAGKFLFHQRLFGQLLRLKMFREDHGSWMTMFFSTILFLF 

SEG xxxxxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMM 
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SEQ IFSHIYNTILLMDGNMGAYIITDYMGIRNESFMKLAAVGTWMGDFVTAWMVTDMMLQDKP 

SEG xxx 

PRD hhhhhhhhhhccccccceeeeecccccchhhhhhhhhhccccccccchhhhhhhhhhccc 

MEM MMMMMMMMMMMM 

SEQ YPDWGKSARAFWKKGNVRITLFWTVLFTLTSVVVLVITTDWISWDKLNRGFLPSDEVSRA 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhcccceeehhhhhhhhhhhheeeeecccccccccccccccccchhhhh 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FL AS F I L V FDLL I VMQDWE FP H FMG DV D VN L PGLHT P HMQ FK I P F FQK IFKEEYRIHITG 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhhhhhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ KWFNYGIIFLVLILDLNMWKNQIFYKPHEYGQYIGPGQKIYTVKDSESLKDLNRTKLSWE 

SEG 

PRD ccceeeeeehhhhhhhcccccceeeccccccccccccceeeeecccccccccccchhhhh 



MEM 



SEQ WRSNHTNPRTNKTYVEGDMFLHSRFIGASLDVKCLAFVPSLIAFVWFGFFIWFFGRFLKN 

Seg xxxxxxxxxxxxxx . . . 

PRD hhcccccccccccccccchhhhhhccccccceeeeeehhhhheeeeccceeeeeeeeccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ EPRMENQDKTYTRMKRKSPSEHSKDMGITRENTQASVEDPLNDPSLVCIRSDFNEIVYKS 

SEG • 

PRD cccccccccchhhhhhccccccccccceeeccccccccccccccceeeeccccceeeeec 

MEM 



SEQ SHLTSENLSSQLNESTSATEADQDPTTSKSTPTN 

SEG 

PRD cccccccccccccccccccccccccccccccccc 

MEM 
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PS00001 
PS00001 
PS00001 
PSO0001 
PS00001 
PS00001 
PSO0004 
PS00005 
PSO0O05 
PS00005 
PS00005 
PS00005 
PSO00O5 
PS00005 
PS00006 
PS00006 
PS00O06 
PS00006 
PS00006 
PS00O06 
PS00O06 
PS00006 
PS00007 
PS00008 

psooooa 

PS00008 
PS00008 
PS00008 



149->153 
353->357 
364->368 
371->375 
487->491 
493->497 
435->439 

55->58 
187->190 
299->302 
342->345 
348->351 
370->373 
507->510 

38->42 
342->346 
348->352 
373->377 
438->442 
456->460 
497->501 
499->503 
326->334 

48->54 

79->85 
106->112 
134->140 
159->165 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN GLYCOSYLATION 

ASN~GLYCOSYLATION 

CAMP PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOS PHO~S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC PHOSPHORITE 

CK2~PHOSPHO SITE 

CK2 PHOSPHORITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2 PHOSPHORITE 

CK2~PHOSPHO~SITE 

TYR~PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC0000X 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_35k24.I) 
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DKFZphtes3_35nl2 



group: metabolism 

DKFZphtes3_35nl2 encodes a novel 315 amino acid protein with strong similarity to ADP, ATP 
carrier T (ANT) proteins. 

The novel protein contains three mitochondrial energy transfer signatures and is closely 
related to the ADP /ATP translocator, or adenine nucleotide translocator (ANT) , a protein most 
abundant in mitochondria- In its functional state, it is a horaodimer of 30-kD subunits 
embedded asymmetrically in the inner mitochondrial membrane. The dimer forms a gated pore 
through which ADP is moved from the matrix into the cytoplasm. 

The new protein can find application in modulation of ADP-transport and energy metabolism in 
cells/mitochondria. 



strong similarity to ADP/ATP carrier proteins 

EST hits to mouse and drosophila 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1803 bp 

Poly A stretch at pos. 1793, polyadenylation signal at pos . 1772 



1 AGCGTCCCAA GAGCCACTTT CTCGCCAGTA CGATGCTGCA GCGGTTTTCC 
51 GGTTTTCCGC TTCCCTTCAT CGTAGCTCCC GTACTCATTT TTAGCCACTG 
101 CTGCCGGTTT TTATATCCTT CTCCATCATG CATCGTGAGC CTGCGAAAAA 
151 GAAGGCAGAA AAGCGGCTGT TTGACGCCTC ATCCTTCGGG AAGGACCTTC 
201 TGGCCGGCGG AGTCGCGGCA GCTGTGTCCA AGACAGCGGT GGCGCCCATC 
251 GAGCGGGTGA AGCTGCTGCT GCAGGTGCAG GCGTCGTCGA AGCAGATCAG 
301 CCCCGAGGCG CGGTACAAAG GCATGGTGGA CTGCCTGGTG CGGATTCCTC 
351 GCGAGCAGGG TTTCTTCAGT TTTTGGCGTG GCAATTTGGC AAATGTTATT 
401 CGGTATTTTC CAACACAAGC TCTAAACTTT GCTTTTAAGG ACAAATACAA 
4 51 GCAGCTATTC ATGTCTGGAG TTAATAAAGA AAAACAGTTC TGGAGGTGGT 
501 TTTTGGCAAA CCTGGCTTCT GGTGGAGCTG CTGGGGCAAC ATCCTTATGT 
551 GTAGTATATC CTCTAGATTT TGCCCGAACC CGATTAGGTG TCGATATTGG 
601 AAAAGGTCCT GAGGAGCGAC AATTCAAGGG TTTAGGTGAC TGTATTATGA 
651 AAATAGCAAA ATCAGATGGA ATTGCTGGTT TATACCAAGG GTTTGGTGTT 
701 TCAGTACAGG GCATCATTGT GTACCGAGCC TCTTATTTTG GAGCTTATGA 
751 CACAGTTAAG GGTTTATTAC CAAAGCCAAA GAAAACTCCA TTTCTTGTCT 
801 CCTTTTTCAT TGCTCAAGTT GTGACTACAT GCTCTGGAAT ACTTTCTTAT 
851 CCCTTTGACA CAGTTAGAAG ACGTATGATG ATGCAGAGTG GTGAGGCTAA 
901 ACGGCAATAT AAAGGAACCT TAGACTGCTT TGTGAAGATA TACCAACATG 
951 AAGGAATCAG TTCCTTTTTT CGTGGCGCCT TCTCCAATGT TCTTCGCGGT 
1001 ACAGGGGGTG CTTTGGTGTT GGTATTATAT GATAAAATTA AAGAATTCTT 
1051 TCATATTGAT ATTGGTGGTA GGTAATCGGG AGAGTAAATT AAGAAATAAC 
1101 ATGGATTTAA CTTGTTAAAC ATACAAATTA CATAGCTGCC ATTTGCATAC 
1151 ATTTTGATAG TGTTATTGTC TGTATTTTGT TAAAGTGCTA GTTCTGCAAT 
1201 AAAGCATACA TTTTTTCAAG AATTTAAATA CTAAAAATCA GATAAATGTG 
1251 GATTTTCCTC CC ACT TAG AC TCAAACACAT TTTAGTGTGA TATTTCATTT 
1301 ATTATAGGTA GTATATTTTA ATTTGTTAGT TTAAAATTCT TTTTATGATT 
1351 AAAAATTAAT CATATAATCC TAGATTAATG CTGAAATCTA GGAAATGAAA 
1401 GTAGCGTCTT TTAAATTGCT ATTCATTTAA TATACCTGTT TTCCCATCTT 
14 51 TTGAAGTCAT ATGGTATGAC ATATTTCTTA AAAGCTTATC AATAGATGTC 
1501 ATCATATGTG TAGGCAGAAA TAAGCTTTGT TCTATATCTC TTCTAAGACA 
1551 GTTGTTATTA CTGTGTATAA TATTTACAGT ATCAGCCTTT GATTATAGAT 
1601 GTGATCATTT AAAATTTGAT AATGACTTTA GTGACATTAT AAAACTGAAA 
1651 CTGGAAAATA AAATGGCTTA TCTGCTGATG TTTATCTTTA AAATAAATAA 
1701 AATCTTGCTA GTGTGAATAT ATCTTAGAAC AAAAGGTATC CTCTTGAAAA 
1751 TTAGTTTGTA TATTTTGTTG ACAATAAAGG AAGCTTAACT GTTAAAAAAA 
1801 AAA 



BLAST Results 



No BLAST result 



Medline entries 



96289608: 

Molecular biological and quantitative abnormalities of 
ADP/ATP carrier protein in cardiomyopathic hamsters. 
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Peptide information for frame 2 



ORF from 128 bp to 1072 bp; peptide length: 315 

Category: strong similarity to known protein 

Classification: Metabolism 

Prosite motifs: MITOCH_CARRIER (40-50) 

MI TOCH_CARRI ER (145-155) 

MI TOCH_CARRIER (242-252) 



1 MHREPAKKKA EKRLFDASSF 6KDLLAGGVA AAVSKTAVAP IERVKLLLQV 
51 QASSKQISPE ARYKGMVDCL VRIPREQGFF SFWRGNLANV IRYFPTQALN 
101 FAFKDKYKQL FMSGVNKEKQ FWRWFLANLA SGGAAGATSL CVVYPLDFAR 
151 TRLGVDIGKG PEERQFKGLG DCIMKIAKSD GIAGLYQGFG VSVQGIIVYR 
201 ASYFGAYDTV KGLLPKPKKT PFLVSFFIAQ WTTCSGILS YPFDTVRRRM 
251 MMQSGEAKRQ YKGTLDCFVK IYQHEGISSF FRGAFSNVLR GTGGALVLVL 
301 YDKIKEFFHI DIGGR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35nl2, frame 2 

PIR:S37210 ADP, ATP carrier protein Tl - mouse, N = 1, Score = 1127, P = 
2.7e-U4 

PIR:A44778 ADP, ATP carrier protein Tl - human, N - 1, Score - 1125, P = 
4 .4e-114 

TREMBL : DMADPATPT_2 product: "ADP/ATP translocase"; Drosophila 
melanogaster gene encoding ADP/ATP translocase, N - 1, Score - 1124, P 
= 5.6e-114 

PIR:XWBO ADP, ATP carrier protein Tl - bovine, M - 1, Score « 1121, P - 
1.2e-113 



>PIR:S37210 ADP, ATP carrier protein Tl - mouse 
Length - 298 

HSPs: 

Score - 1127 (169.1 bits), Expect = 2.7e-114, P « 2.7e-114 
Identities = 214/293 (73%), Positives = 248/293 (84%) 



Query: 


17 


Sbjct: 


5 


Query: 


77 


Sbjct: 


65 


Query: 


137 


Sbjct: 


125 


Query: 


197 


Sbjct: 


135 


Query: 


257 


Sbjct: 


245 



A SF KD LAGG+AAAVSKTAVAPIERVKLLLQVQ +SKQIS E +YKG++DC+VRIP+E 



QCF S FWRGNLAN V I RYFPTQALNFAFKDK YKQ+ F+ GV++ KQFWR+F NLASGGAAG 



ATSLC VYPLDFARTRL D+GKG +R+F GLGDC+ KI KSDG+ GLYQGF VSVQGI 



I+YRA+YFG YDT KG+LP PK 



+VS+ IAQ VT +G++SYPFDTVRRRMMMQSG 



Y GTLDC+ KI + EG ++FF+GA+SNVLRG GGA VLVLYD+IK++ 



Pedant information for DKFZphtes3_35nl2, frame 2 



Report for DKFZphtes3_35nl2 . 2 



( LENGTH J 315 
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[MW] 

[pi] 
[ HOMOL ) 
( FUNCAT ] 
[FUNCAT) 
( FUNCAT ) 
I FUNCAT] 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
le-13 
[FUNCAT] 
[FUNCAT] 
6e-12 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
( PIRKW) 
[PIRKW] 
[PIRKW] 
[PIRKW) 
[PIRKW) 
[ PIRKW) 
[PIRKW) 
(PIRKW) 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[ PROS IT E) 
[PFAM] 
[KW] 
[KW] 



35022.03 
9.91 

PIR:S37210 ADP, ATP carrier protein Tl - mouse le-115 

07.16 purine and pyrimidine transporters (S. cerevisiae, YBL030c] 2e-72 

08.04 mitochondrial transport IS. cerevisiae, YBL030c] 2e-72 

30.16 mitochondrial organization [S. cerevisiae, YBL030c) 2e-72 

01.03.19 nucleotide transport [S. cerevisiae, YBL03Oc] 2e-72 

01.07.10 transport of vitamins, cof actors, and prosthetic groups [S. 
YIL006w) 2e-14 ^ 

07.99 other transport facilitators (S. cerevisiae, YIL006w] 2e-14 
01.05.07 carbohydrate transport IS. cerevisiae, YPR021c] 5e-14 

07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021c) 5e-14 
07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w] 

02.13 respiration [S. cerevisiae, YBRl92w) 4e-13 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w) 



13.04 homeostasis of other ions 
01.04.07 phosphate transport IS 
01.01.07 amino-acid transport 
07.10 amino-acid transporters 
99 unclassified proteins [S 



IS. cerevisiae, YLR348c] 4e-10 
cerevisiae, YLR348c] 4e-10 

[S. cerevisiae, YOR130C] le-06 

[S. cerevisiae, YOR130c] le-06 
cerevisiae, YPRl28c] 2e-06 



04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c) 2e-06 
BL00215B Mitochondrial energy transfer proteins 
BL00215A Mitochondrial energy transfer proteins 
duplication le-115 
phosphate transport 2e-09 
heart 3e-24 

transmembrane protein le-115 
mitochondrial inner membrane 7e-72 
transport protein 4e-08 
acetylated amino end le-115 
adipose tissue 5e-13 
mitochondrion le-115 
alternative splicing 2e-09 
methylated amino acid le-115 
chloroplast 2e-14 
homodimer le-115 

hypothetical protein YFR04 5w 3e-07 
ADP, ATP carrier protein le-115 
Btl protein 2e-14 

ADP, ATP carrier protein repeat homology le-115 

probable carrier protein YPR021c le-12 

MITOCH_CARRIER 3 

Mitochondrial carrier proteins 

TRANSMEMBRANE 2 

LOW COMPLEXITY 4.76 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MHREPAKKKAEKRLFDASSFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPE 
ccchhhhhhhhhhhhhchhhhhhhhhchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhh 

ARYKGMVDCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQ 
hhhhhhhheeeeccccceeeeecccccceeeeecccchhhhhhhhhhhhhhccccccccc 

FWRWFLANLASGGAAGATSLCWYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSD 

xxxxxxxxxxxxxxx 

eeeecccccccccccceeeeeeeccchhhhhhhhhhccccchhhhhhcccceeeeeeccc 

GIAGLYQGFGVSVQGIIVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQWTTCSGILS 

cccccccccceeeccceeehhhhhccccccccccccccccccchhhhhhhhhhheeeeec 
. . . . MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

YPFDTVRRRMMMQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGTGGALVLVL 

cccchhhhhhhhhcccceeeecccchhhhhhhhhcccccccccchhhhhccccceeeeee 
MMMMMMMMMMM 



SEQ 
SEG 
PRD 
MEM 



YDKIKEF FHIDIGGR 
hhhhhhheeeecccc 
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Prosite for DKFZphtes3_35nl2 . 2 

PS00215 40->50 M I TOC H_C ARRI ER PDOC00189 

PS00215 145->155 MITOCHJTARRIER PDOC0O189 

PS00215 242->252 MITOCH_CARRIER PDOC00189 



Pfam for DKFZphtes3_35nl2 . 2 



HMM_NAME 


Mitochondrial carrier proteins 




HMM 
Query 


19 


♦pFwkdFLAGGIAGmMeHTvMFPIDtlKTRMQIQgEMpM. . ahpRYkGMI 
SFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPEARYKGMV 


67 


HMM 


68 


dCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFYEFMKeMFiDyf ge 
DO +I++++G++++WRG++ANVIRY+P++A++F+F++ +K +F + +++ 
DCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNK 


117 


HMM 
Query 


118 


ddnyWmWFwmnYMaGsmAGEwisvIitYPMWvVKTRLQaDqkHphsQp . R 
++W+WF+ N+++G++AG ++S+ ++YP+++++TRL D +++++ R 
EKQFWRWFLANLASGGAAG-ATSLCVVYPLDFARTRLGVD--IGKGPEER 


164 


HMM 
Query 


165 


hYNGvWNcWrklYReEGgFkGLYRGWtPTWMRMIPYqraiYFf vYEtLKeW 
+++G+ +C KI +++G ++GLY+G++ +++++I+Y++ YF++Y+T K + 
QFKGLGDCIMKIAKSDG-IAGLYQGFGVSVQGIIVYRASYFGAYDTVKGL 


213 


HMM 
Query 


214 


lynYtgYnPgprelCMddsPwWhWilgWmlAGMiaWivSYPfDVVRTRMM 
L +++ + ++ ++++I+SYPFD+VR+RMM 


251 


HMM 
Query 


252 


Mdsm.edhkYqSmlDCWMqIYKnEGFkGFWKGFWPRIMRiMPWtAIMFml 
M+s + ++++Y+++L0C+++IY++EG+ +F++G+ +++R+ ++A+++++ 
MQS G EAK RQ Y KGTLDCFVKIYQHEGISSF FRG AFS N V LRGT - GG ALVL V L 


300 


HMM 
Query 


301 


YEqMKwFL* 
Y+ +K+F+ 
YDKIKEFF 308 
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DKFZphtes3_35n24 



group: testes derived 

DKFZphtes3_35n24 encodes a novel 365 amino acid protein without similarity to known proteins. 

The novel protein contains a Prosite I g ( Immunoglubul in ) -MHC pattern. This pattern represents a 
domain, approximately one hundred amino acids long and including a conserved intra-domain 
disulfide bond (Ilg domain!) . Thus, the novel protein is a new member of the Ig-super family . 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1589 bp 

Poly A stretch at pos. 1579, polyadenylation signal at pos. 1560 



1 CGATCGTCAC GTGACGCCGG GGTTCAGCGT ATCCTTGCTG GGCAACCGTC 
51 TTAGAGACCA GCACTGCTGG CTGCACCATG AATGTGATCT ACCCACTGGC 
101 AGTCCCCAAG GGGCGCAGAC TCTGCTGTGA GGTGTGCGAA GCCCCAGCCG 
151 AGCGGGTGTG CGCGGCCTGC ACAGTCACTT ATTACTGTGG GGTGGTACAT 
201 CAGAAGGCTG ACTGGGACAG CATCCATGAG AAAATATGTC AGCTCTTGAT 
251 TCCACTGCGC ACTTCCATGC CCTTCTACAA TTCAGAGGAA GAACGGCAGC 
301 ATGGCCTGCA GCAGCTGCAG CAGCGGCAGA AGTATTTGAT TGAATTCTGC 
351 TACACCATAG CCCAGAAATA CCTCTTTGAA GGGAAACACG AAGATGCTGT 
401 ACCAGCAGCT TTGCAGTCCC TTCGCTTCCG TGTGAAGCTG TATGGCCTGA 
451 GCTCCGTAGA GCTTGTGCCT GCTTACCCGC TGTTGGCCGA GGCCAGCCTT 
501 GGTCTGGGCC GAATCGTTCA GGCTGAAGAA TATCTATTCC AAGCCCAGTG 
551 GACAGTCCTC AAATCAACTG ACTGTAGTAA TGCCACCCAC TCTTTACTGC 
601 ATCGGAATCT GGGACTTCTC TATATAGCTA AGAAAAACTA TGAAGAGGCC 
651 CGTTATCATC TGGCCAATGA TATTTATTTT GCCAGTTGTG CATTTGGAAC 
701 AGAGGACATT AGGACTTCAG GAGGCTACTT CCACCTGGCT AATATATTCT 
751 ATGACCTTAA AAAGTTGGAC CTGGCAGACA CATTGTACAC CAAGGTCTCT 
801 GAGATCTGGC ATGCATATTT GAACAATCAC TATCAAGTCC TCTCACAGGC 
851 TCACATCCAA CAAATGGATT TACTGGGCAA ACTATTTGAG AATGACACTG 
901 GCTTGGATGA AGCCCAAGAA GCAGAAGCCA TTCGCATCCT GACTTCAATC 
951 TTGAACATTC GAGAATCTAC ATCTGACAAA GCCCCCCAAA AAACCATCTT 
1001 TGTTCTGAAG ATCCTGGTCA TGCTTTACTA CCTGATGATG AATTCTTCAA 
1051 AGGCACAGGA ATATGGCATG AGGGCCCTCA GTCTAGCCAA AGAACAACAG 
1101 CTTGATGTCC ATGAGCAAAG CACCATTCAA GAGTTATTAA GTCTCATTTC 
1151 AACTGAAGAC CATCCCATTA CTTAGTGACC CATGAGCTCT GCATCAAGGG 
1201 TTATTCCAGG GGCTACTGAA GATCTAATAT ATTCCAGCCT TGCACAACTG 
1251 CTTTGAGGTA CTGTAGACTG CTGAAGTTTC CACCCTCTTC CCCTGGGATT 
1301 GCACACATAG CTGTTATTTT TTTCTTACAC AGCATATTAA GGGAATATAA 
1351 AGCTTTAGGC ATAGAAATCA CTAAAAACTG TGTTTGTCAT GACCTTTGTA 
1401 CTTGATTTAT CATGACTTTG TATGACTGAG TAATATGTAG TCAGATCACT 
1451 AATATGGTAT TTGTAATTAA ACTACAAATA GTTTGTCATT TCCCAGAAGT 
1501 CTTCCAACGA TGCATGTTTC ATACACTTTT GCTAAAGGAG GGGTAAAGGA 
1551 GGGGGTAGGG AATAAAGCTA TATTGGAACA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 78 bp to 1172 bp; peptide length: 365 
Category: putative protein 
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Prosite motifs: IG MHC (35-42) 



1 MNVIYPLAVP KGRRLCCEVC EAPAERVCAA CTVTYYCGW HQKADWDSIH 

51 EKICQLLIPL RTSMPFYNSE EERQHGLQQL QQRQKYLIEF CYTIAQKYLF 

101 EGKHEDAVPA ALQSLRFRVK LYGLSSVELV PAYPLLAEAS LGLGRIVQAE 

151 EYLFQAQWTV LKSTDCSNAT HSLLHRNLGL LYIAKKNYEE ARYHLANDIY 

201 FASCAFGTED IRTSGGYFHL ANIFYDLKKL DLADTLYTKV SEIWHAYLNN 

251 HYQVLSQAHI QQMDLLGKLF ENDTGLDEAQ EAEAIRILTS ILNIRESTSD 

301 KAPQKTIFVL KILVMLYYLM MNSSKAQEYG MRALSLAKEQ QLDVHEQSTI 

351 QELLSLISTE DHPIT 

BLAST P hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n24, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35n24, frame 3 



Report for DKFZphtes3_35n24 . 3 



(LENGTH] 365 

[MWJ 41768-24 

[pi] 5.82 

[BLOCKS] BL00273 Heat-stable enterotoxins proteins 

[PROSITE] MYRISTYL 1 

[PROSITE] IG MHC 1 

[ PROSITE) AMIDATION 1 

[PROSITE] CK2_PHOSPHO SITE 7 

(PROSITE) TYR PHOSPHORS ITE 4 

[PROSITE] PKC~PHOSPHO_SITE 3 

(PROSITE) ASN_GLYCOS Y L AT I ON 3 

(KW) Alpha Beta 

[KW] LOW COMPLEXITY 4.11 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MNVI YPLAVPKGRRLCCEVCEAPAERVCAACTVTYYCGVVHQKADWDSIHEKICQLLIPL 

ccceeeeeccccceeeeeeeehhhhhhhheeeeeeeeeecccccccchhhhhhhhheeec 

RTSMPFYNSEEERQHGLQQLQQRQKYLIEFCYTIAQKYLFEGKHEDAVPAALQSLRFRVK 

xxxxxxxxxxxxxxx 

cccccccchhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

LYGLSSVELVPAYPLLAEASLGLGRIVQAEEYLFQAQWTVLKSTDCSNATHSLLHRNLGL 

hhccceeeeccccchhhhhccccchhhhhhhhhhhhhhhccccccccccccccccccccc 

LYIAKKNYEEARYHLANDIYFASCAFGTEDIRTSGGYFHLANIFYDLKKLDLADTLYTKV 

eeeehhhhhhhhhhhhhheeeeeccccccccccccceeehhhhhhhhhhhhccceeeeeh 

SEIWHAYLNNHY QVLS Q AH I QQMD L LGK LFEN DTG L DEAQEAEAI RI LT S I LN I RES T S D 

hhhhhhhhcccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccc 

KAPQKTIFVLKILVMLYYLMMNSSKAQEYGMRALSLAKEQQLDVHEQSTIQELLSLISTE 

ccccceeeehhhhhhhhhhhhcccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

DHPIT 



Prosite for DKFZphtes3_35n24 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 



168->172 
272->276 
322->326 
U4->117 
299->302 



ASNJ3L YCOS Y LAT I ON 
ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
PKC PHOSPHORS ITE 
PKC~PHOSPHO SITE 



323->326 PKC_PHOSPHO_SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00007 
PS00008 
PS00009 
PS00290 



48->52 
69->73 
125->129 

274- >278 
297->301 
349->353 
358->362 

85->93 
186->X94 
186->194 
185->194 

275- >281 
11->15 
35->42 



CK2 PHOSPHO 

CK2~PHOSPHO~ 

CK2_PHOSPHCf 

ck2_phospho*" 

ck2_phospho~ 

ck2_phospho" 

ck2 phospho" 

tyr'phospho" 

tyr~phospho 

tyr~phospho~ 

tyr_phospho~ 

myristyl 

ami dat i on 

IG MHC 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00009 
PDOC00262 



(No Pfam data available for DKFZphtes3_35n24 . 3) 
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DKFZphtes3_35n9 



group: metabolism 

DKFZphf tes3_35n9 encodes a novel 607 amino acid protein which is a splice variant of human 
carboxylesterase {EC 3.1.1.1). 

The novel protein contains both, one carboxylesterase Bl and one B2 pattern. In comparison to 
EC 3.1.1.1, DKFZphtes3_35n9 shows a N-terminal extension and aa 458-474 are missing. 

The new protein can find application in modulation of carboxylester metabolism and as a new 
enzyme for biotechnologic production processes. 



carboxylesterase, splice variant 

5* extension of mRNA and N-terminal elongation of protein (64 aa) , 
missing exoni aa 458-474 of JC5408 are missing 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2888 bp 

Poly A stretch at pos. 2878, no polyadenylation signal found 



1 CTCGGCCTGA GGTGCGAGAG AAGCGGTGAC CGCGGCCCTG GCTGCTCGGA 
51 CCCGGGAACA TGATGGTCGC TGGAGCAGAA GGCGCTGAGA AGGGACCACG 
101 GCGGCGCTGG GTCGTGCGAG CCAGTAGCGG GCTGAAACGT AGAGGCCAGA 
151 ACCAGGTCTC AGGGGGCACT AAAGGCGGTC GGAGGTAATC CCCACACCGC 
201 TTCCTCCTGG AAGTCAGGCT GGCCGGGAGC TCCCGTATCC AGGACGGTTG 
251 GTCGCCTCTG GCCTGGCAGG GATCCTAGTG TCTCGGGACC TCCCGGTGAC 
301 GCGCCTGCCT CCCCTGCTGC ACCATAGGCC CGGGAGTACG GCGTCCCCAC 
351 AGCTTGGACC GGCAGGGGCT CGTGAAATGT TTGTCAAGTG GATAAATGAC 
401 CATGGCCGTG GTCTCCGCGG GAGGTGAGGA AACTGAAAGC CACCGAGGAA 
451 AAGGGGGGCG CTCCTTAAGA AGTGCCGCGG TCACGTGTAC GTTTCAAAAG 
501 AATGGCGTGA CTGAGTAGGG AGGGGACCGC GGAGACCCTC AGACCCTGGA 
551 CTGTAAGGAG ATGAGGGGCC GTGAAGGGGA ACCCAGGAAA CTGAGTCCTG 
601 AAAGCAAGGA GGAACTTCCA GAATGAAGGG CGCCGACACT CCTTCCTGCC 
651 TTTGCTCAAG CGGTTCCTTC ACCCCGATCA AGTTCCTTCC CATTTCTCCA 
701 TCTGGGGGAT CCTGAACGTG CACATCCTCA GAGAAGCCCT CCTGGGGTCT 
751 CCAATTCTAG TTTATTGCCC CCTCCTATCG ATCCCCCAGC GCGCTCATCG 
801 GGCCTGTGGA CAAGGACAGG TTTGAAGAGA GGATTCCCTG GATCGCGGAA 
851 GGGCTGCAGG AATGGCACAG CCCCTTCCGA GGATGCCAAA GGAGCCCGGG 
901 CAAAGGAAAG TGGCCGTGCC CGGGCCTGCC TACCACTAGA TCCCCACCCA 
951 CCTATGACTG CTCAGTCCCG CTCTCCTACC ACACCCACCT TTCCCGGCCC 
1001 AAGCCAGCGC ACCCCGCTGA CTCCCTGCCC AGTCCAAACT CCAAGGCTGG 
1051 GCAAGGCACT GATCCACTGC TGGACAGACC CGGGGCAGCC TCTGGGTGAA 
1101 CAGCAGCGTG TCCGCCGGCA GCGAACCGAG ACCAGCGAGC CGACCATGCG 
1151 GCTGCACAGA CTTCGTGCGC GGCTGAGCGC GGTGGCCTGT GGGCTTCTGC 
1201 TGCTTCTTGT CCGGGGCCAG GGCCAGGACT CAGCCAGTCC CATCCGGACC 
1251 ACACACACGG GGCAGGTGCT GGGGAGTCTT GTCCATGTGA AGGGCGCCAA 
1301 TGCCGGGGTC CAAACCTTCC TGGGAATTCC ATTTGCCAAG CCACCTCTAG 
1351 GTCCGCTGCG ATTTGCACCC CCTGAGCCCC CTGAATCTTG GAGTGGTGTG 
1401 AGGGATGGAA CCACCCATCC GGCCATGTGT CTACAGGACC TCACCGCAGT 
1451 GGAGTCAGAG TTTCTTAGCC AGTTCAACAT GACCTTCCCT TCCGACTCCA 
1501 TGTCTGAGGA CTGCCTGTAC CTCAGCATCT ACACGCCGGC CCATAGCCAT 
1551 GAAGGCTCTA ACCTGCCGGT GATGGTGTGG ATCCACGGTG GTGCGCTTGT 
1601 TTTTGGCATG GCTTCCTTGT ATGATGGTTC CATGCTGGCT GCCTTGGAGA 
1651 ACGTGGTGGT GGTCATCATC CAGTACCGCC TGGGTGTCCT GGGCTTCTTC 
1701 AGCACTGGAG ACAAGCACGC AACCGGCAAC TGGGGCTACC TGGACCAAGT 
1751 GGCTGCACTA CGCTGGGTCC AGCAGAATAT CGCCCACTTT GGAGGCAACC 
1801 CTGACCGTGT CACCATTTTT GGCGAGTCTG CGGGTGGCAC GAGTGTGTCT 
1851 TCGCTTGTTG TGTCCCCCAT ATCCCAAGGA CTCTTCCACG GAGCCATCAT 
1901 GGAGAGTGGC GTGGCCCTCC TGCCCGGCCT CATTGCCAGC TCAGCTGATG 
1951 TCATCTCCAC GGTGGTGGCC AACCTGTCTG CCTGTGACCA AGTTGACTCT 
2001 GAGGCCCTGG TGGGCTGCCT GCGGGGCAAG AGTAAAGAGG AGATTCTTGC 
2051 AATTAACAAG CCTTTCAAGA TGATCCCCGG AGTGGTGGAT GGGGTCTTCC 
2101 TGCCCAGGCA CCCCCAGGAG CTGCTGGCCT CTGCCGACTT TCAGCCTGTC 
2151 CCTAGCATTG TTGGTGTCAA CAACAATGAA TTCGGCTGGC TCATCCCCAA 
2201 GGTCATGAGG ATCTATGATA CCCAGAAGGA AATGGACAGA GAGGCCTCCC 
2251 AGGCTGCTCT GCAGAAAATG TTAACGCTGC TGATGTTGCC TCCTACATTT 
2301 GGTGACCTGC TGAGGGAGGA GTACATTGGG GACAATGGGG ATCCCCAGAC 
2351 CCTCCAAGCG CAGTTCCAGG AGATGATGGC GGACTCCATG TTTGTGATCC 
2401 CTGCACTCCA AGTAGCACAT TTTCAGTGTT CCCGGGCCCC TGTGTACTTC 
24 51 TACGAGTTCC AGCATCAGCC CAGCTGGCTC AAGAACATCA GGCCACCGCA 
2501 CATGAAGGCA GACCATGTTA AATTCACTGA GGAAGAGGAG CAGCTAAGCA 
2551 GGAAGATGAT GAAGTACTGG GCCAACTTTG CGAGAAATGG GAACCCCAAT 
2601 GGCGAGGGTC TGCCACACTG GCCGCTGTTC GACCAGGAGG AGCAATACCT 
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2651 GCAGCTGAAC CTACAGCCTG CGGTGGGCCG GGCTCTGAAG GCCCACAGGC 
2701 TCCAGTTCTG GAAGAAGGCG CTGCCCCAAA AGATCCAGGA GCTCGAGGAG 
2751 CCTGAAGAGA GACACACAGA GCTGTAGCTC CCTGTGCCGG GGAGGAGGGG 
2801 GTGGGTTCGC TGACAGGCGA GGGTCAGCCT GCTGTGCCCA CACACACCCA 
2851 CTAAGGAGAA AGAAGTTGAT TCCTTCATAA AAAAAAAA 



BLAST Results 



Entry D50579 from database EMBL: 

Homo sapiens mRNA for carboxylesterase, complete cds . 
Score = 7197, P - 0.0e+00, identities = 1441/1443 

Entry JC5408 from database PIR: 
carboxylesterase (EC 3.1.1.1) - human 

Score - 2808, P - 1.2e-291, identities * 542/559, positives » 543/559, 
frame +3 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 954 bp to 2774 bp; peptide length: 607 
Category: known protein 
Classification: Metabolism 

Prosite motifs: CARBOX YLESTERAS E_B_1 (279-295) 
CARBOXYLESTERASE_B_2 (185-196) 



1 MTAQSRSPTT PTFPGPSQRT PLTPCPVQTP RLGKALIHCW TDPGQPLGEQ 
51 QRVRRQRTET SEPTMRLHRL RARLSAVACG LLLLLVRGQG QDSASPIRTT 
101 HTGQVLGSLV HVKGANAGVQ TFLGIPFAKP PLGPLRFAPP EPPESWSGVR 
151 DGTTHPAMCL QDLTAVESEF LSQFNMTFPS DSMSEDCLYL SI YT PAHS HE 
201 GSNLPVMVWI HGGALVFGMA SLYDGSMLAA LENVVVVIIQ YRLGVLGFFS 
251 TGDKHATGNW GYLDQVAALR WVQQNIAHFG GNPDRVTIFG ESAGGTSVSS 
301 LWSPISQGL FHGAIMESGV ALLPGLIASS ADVISTVVAN LSACDQVDSE 
351 ALVGCLRGKS KEEILAINKP FKMI PGVVDG VFLPRHPQEL LASADFQPVP 
401 SIVGVNNNEF GWLIPKVMRI YDTQKEMDRE ASQAALQKML TLLMLPPTFG 
451 DLLREEYIGD NGDPQTLQAQ FQEMMADSMF VIPALQVAHF QCSRAPVYFY 
501 EFQHQPSWLK NIRPPHMKAD HVKFTEEEEQ LSRKMMKYWA NFARNGNPNG 
551 EGLPHWPLFD QEEQYLQLNL QPAVGRALKA HRLQFWKKAL PQKIQELEEP 
601 EERHTEL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n9, frame 3 

PIR:JC5408 carboxylesterase {EC 3.1.1.1) - human, N ° 1# Score « 2808 
P - 1.9e-292 

TREMBL:HSU60553 1 gene: "hCE-2"; product: "carboxylesterase"; Human 
carboxylesterase (hCE-2) mRNA, complete cds., N • 1, Score - 2761, P 
1.8e-287 

PIR:A34329 60K esterase (EC 3.1.1.-) isoform 2 - rabbit, N - 1, Score 
1985, P = 3.1e-205 

TREMBL : D50580_l product: "carboxylesterase precursor'*; Rattus 
norvegicus mRNA for carboxylesterase, partial cds., N = 1, Score = 
1984, P * 4e-205 



>PTR:JC5408 carboxylesterase (EC 3.1.1.1) - human 
Length « 559 

HSPs: 

Score - 2808 (421.3 bits), Expect - 1.9e-292, P = 1.9e-292 
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Identities = 542/559 {96%), Positives - 543/559 (97%) 

Query: 65 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 124 

MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 
Sbjct: 1 MRLH RLRARLS AV ACGLLLLL VRGQGQDS AS P I RTTHTGQVLGS L VH VKGAN AG VQT FLG 60 

Query: 125 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 184 

IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFKMTFPSDSMS 
Sbjct: 61 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 120 

Query* 185 EDCLYLS I YT PAHSHEGSNLPVMVWI HGGALVFGMASL YDGSMLAALENVVWI I QYRLG 244 

EDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVWIIQYRLG 
Sbjct: 121 EDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVWVIIQYRLG 180 

Query: 245 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 304 

V LG F FSTGDKH AT GNWG YL DQV AAL RWVQQN I AH FGGN P DR VT I FGESAGGTSVSSLVVS 
Sbjct: 181 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNI AH FGGN PDRVT I FGESAGGTSVSSLVVS 240 

Query: 305 PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 364 

PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 
Sbjct: 241 PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 300 

Query 365 LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 424 

LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 
Sbjct: 301 LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 360 

Query: 425 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 484 

KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 
Sbjct: 361 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 420 

Query: 485 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH VKFTEEE 528 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH +KFTEEE 
Sbjct: 421 LQ V AH FQC SRAPVYFYEFQHQPSWLKNI RP PHMKA DHG D EL P FV FRS FFGGN Y I K FT EE E 480 

Query: 529 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 588 

EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 
Sbjct: 481 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 540 

Query: 589 ALPQKIQELEEPEERHTEL 607 

ALPQKIQELEEPEERHTEL 
Sbjct: 541 ALPQKIQELEEPEERHTEL 559 

Pedant information for DKFZphtes3_35n9, frame 3 

Report for DKFZphtes3_35n9 . 3 



{LENGTH] 
tMWJ 

CPU 

[HOMOL] 

(BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

(BLOCKS] 

[SCOP] 

[SCOP] 

I SCOP] 

(EC] 

(EC) 

[EC] 

[EC] 

(EC) 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

( PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 



607 

67051.20 
6.11 

PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 0.0 

BL01173A Lipolytic enzymes "G-D-X-G" family, histidine 

BL00122G 

BL00122F 

BL00122E 

BL00122D Carboxylesterases type-B serine proteins 
BL00122C Carboxylesterases type-B serine proteins 
BL00122B Carboxylesterases type-B serine proteins 
BL00122A Carboxylesterases type-B serine proteins 

dlakn 3.56.1.1.4 Bile-salt activated lipase [Bovine (Bos taurus le-158 
d2ack 3.56.1.1.1 Acetylcholinesterase (Electric ray (Torped le-170 
dlthg~ 3.56.1.9.7 type-B carboxylesterase/lipase (fungu le-149 
3.1.1.13 Sterol esterase le-52 

3.1.1.7 Acetylcholinesterase 5e-74 
3.1.1.1 Carboxylesterase 0.0 

3.1.1.8 Cholinesterase Se-68 

3.1.1.59 Juvenile-hormone esterase le-34 

3.1.1.3 Triacylglycerol lipase 3e-52 

duplication 2e-47 

homotetramer 3e-67 

transmembrane protein 9e-44 

microsome le-130 

pancreas 3e-52 

endoplasmic reticulum le-134 
homotrimer le-134 

phosphatidylinositol linkage 5e-74 

synapse 3e-73 

liver le-131 

heparin binding 3e-52 
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PIRKW] 


phosphoprotein 7e-25 




PIRKW] 


glycoprotein le-I34 


47 


PIRKW] 


thyroid hormone biosynthesis 2e- 


PIRKW] 


carboxylic ester hydrolase 0.0 




PIRKW] 


monomer 2e-42 




PIRKW] 


disulfide bond 2e-31 




PIRKW] 


mammary gland 3e-52 




PI RKW ] 


alternative splicing 5e-74 




PIRKW] 


iodine 2e-47 




PIRKW] 


pyroglutamic acid 6e-39 




PIRKW] 


hydrolase le-135 




PIRKW] 


muscle 3e-73 




PIRKW] 


thyroid gland 2e-47 




PIRKW] 


membrane protein 3e-73 




PIRKW] 


neurotransmitter degradation 3e-73 


PIRKW] 


cholesterol 3e-52 




PIRKW] 


homodimer 2e-47 




PIRKW] 


nerve 3e-73 




SUPFAM] 


cholinesterase 0.0 




SUPFAMJ 


triacylglycerol lipase le-32 




SUPFAM] 


cholinesterase homology 0.0 




SUPFAM] 


thyroglobulin 2e-47 




SUPFAM] 


thyroglobulin type I repeat homology 2e-47 


SUPFAM] 


juvenile-hormone esterase 2e-35 




SUPFAM] 


probable lipolytic protein ybaC 


le-07 


PROSITE] 


CARBOXYLESTERASE B 2 1 




[PROSITE] 


CARBOXY LEST ERAS E_B_1 1 




PFAM] 


Carboxylesterases 




[KW] 


Alpha Beta 




[KW] 


3D 




KW] 


LOW COMPLEXITY 3.95 % 





SEQ MTAQSRS PTT PT F PG PSQRT PLT PC PVQTPRLGKALI HCWT DPGQP LGEQQRVRRQRTET 

SEG xxxxxxxx. . . 

lacj- 

SEQ S EPTMRLHRL RARL S A V AC GLL LL L V RGQGQDS AS P I RTT HTGQV LG S L VH VKG AN AG VQ 

SEG XXXXX 

lacj- ETTEEEECEEEEETTEE- - EE 

SEQ TFLGIPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPS 

SEG 

lacj- EEEEEECEETTTGGGTTTCCEECCCCCCEEECCCCCCBCCCCCCTTTTTT-HHHHHCCCC 

SEQ DSMSEDCLYLSIYTPAHSHEGSNLPVlfVWIHGGALVFGMASLYDGSMLAALENVVVVIIQ 

SEG 

lacj- CCBTTTTCEEEEEET— TTTTTTEEEEEEECTTTTTTCTTTTGCHHHHHHHHCCEEEECC 

SEQ Y RLG VLG F F S T G D KH ATGN WG Y L DQ V AAL RWVQQN I AH FGGN P D RVT I FGESAGGTSVSS 

SEG 



lacj- 

SEQ LWSPISQGLFHGAIMESGVALLPGLIASSADVISTWANLSACDQVDSEALVGCLRGKS 

SEG 

lacj- HHHCGGGTTTTCEEEEETTTTTTTTTTBCHHHHHHHHHHHHC-CCCCCHHHHHHHHHHCC 

SEQ K EE I LA I NKPFKMI PGVVDGV FLPRH PQELLASADFQPVPS I VGVNNNEFGWLI PKVMRI 

SEG 

lacj- HHHHHHHHTCCCTTTCBTTTTTTTTTHHHHHHHTTTCCCCEEEEEETBTHHHHHHTTTTT 

SEQ YDTQKEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDKGDPQTLQAQFQEMMADSMF 

SEG 

lacj- TTTCCCCCHHHHHHHHHHHTTTTCHHHHHHHHHHCTTTTTTTHHHH-HHHHHHHHHHHHH 

SEQ V I P A LQV AH FQC S RA P VY FY E FQH QPSWLKNIRPP HMKA DH V K FT EEEEQL S RKMM K Y W A 

SEG 

lacj- HHHHHHHHHHHHCCCCEEEEEECCCCGGGTTBTTTHHHCGGGCCCHHHHHHHHHHHHHHH 

SEQ NFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKKALPQKIQELEEP 

. . . . XXXXX 



SEG 



lacj - HHHHHCCCCCCC— CCCCBTTTTBEEEECCCCCEEETTTHHHHHHHHHHHHH . 

SEQ EERHTEL 

SEG xxxxxx. 

lacj- 
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PS00122 279->295 CARBOXYLESTERASE B 1 PDOC00112 
PS00941 185->196 CARBOXYLESTERASE~B~2 PDOC00112 



Pfam for DKF2phtes3_35n9 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Carboxylesterases 



69 



•MfMnwlimFLLwmltwIi.WheqaprpPdPyiVdtnnCGJcIRGmNedtD 
+ +L+++ ++++++++ ++Q++++P I T+ G + G ++ + 
RLRARLSAVACGLLLLLVRGQGQDSASP IRTTHT-GQVLGSLVHVK 



113 



NG . . pYYvFlGI PYAEPPVGNLRFKePQPYhePWtNVWNATnYPPMCMQW 
+ + +FLGI P+A+PP+G LRF +P+P +E W++V++ T+ P MC+Q+ 
114 GANAGVQTFLGIPFAKPPLGPLRFAPPEP-PESWSGVRDGTTHPAMCLQD 162 

ndFGFWlFdmieMWNeniP. . eMSEDCLYLNVWTPWnrkPNskLPVMVWI 
+++ +++N++ P +MSEDCLYL+++TP+ + ++S+LPVMVWI 
163 LTAV — ESEFLSQFNMTFPSDSMSEDCLYLSIYTPAHSHEGSNLPVMVWI 210 

HGGGFMFGSGhsYPliqYDgeylMMeeNVIVVtlNYRLGPFGFLSTgDid 
HGG+++FG + ++YDG+ L++ ENV+W I+YRLG++GF+STGD + 

211 HGGALVFGMA SLYDGSMLAALENWWIIQYRLGVLGFFSTGDKH 255 

lPPHGNWGLWDQRMALQWVQDNIAnFGGDPNNITIFGESAGGMSVHlHML 
+ GNWG++DQ++AL+WVQ+NIA+FGG+P+++TIFGESAGG+SV+ ++ 
256 AT — GNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLW 303 

SYGGDNPPmf KqLFHRAIMQSGsAmcPWvIQsnyNaRqRAf RFArimGCN 
S P + +LFH AIM+SG A+ P++I S++ + +A++ C+ 

304 s PISQGLFHGAIMESGVALLPGLIASSA — DVISTVVANLSACD 345 

rmDs sEMIqCLRsKPvEELWdAtWnFWmWf Yf PF1 PWFFgPVI DGDDaPE 
+ DS++ + + CLR K+ EE++ + +++ +F + + +DG+ 
34 6 QVDSEALVGCLRGKSKEEILAINK PFKMIPGV — VDGV 381 

aFIPDHPeeMI IcEGkFnDVPWllGYNnDEGiWFapMmMnf nWfdEDeWId 
F+P+HP+E++++ F VP I+G+NN E++W++P M + + +E++ 
382 -FLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDT-QKEMDR 429 

itNedWyeWMPYIlFYrddmsNikDMDDYiDkvyEeYPgWWDrFPqESYW 
++ + ++ M +L + + + D ++EEY+G+ + PQ 

430 EASQAALQKMLTLLMLPPT-F GDLLREEYIGDNGD-PQTLQA 469 

nLqDMFTDYLFWCPtRihadnHRkHwgsPVYMYeFDHPpSFGYgQFFmWR 
++Q+M+ D F++P + ++H++ +PVY+YEF+H PS + 
470 QFQEMMADSMFVIP— ALQVAHFQCSRAPVYFYEFQHQPSW LKN 511 

WWPpWMgvdH* 
+PP+M++DH 
512 IRPPHMKADH 521 

♦tEEEiissMRmMMNYWINFAKhGNPNnthnglCWWPqYTsnEQYdMIMe 
TEEE+ +S R MM+YW+NFA++GNPN++ GL++WP ++++EQY++ + 
525 TEEEEQLS-RKMMKYWANFARNGNPNGE— GLPHWPLFDQEEQYLQLNL 570 



1 1 ImiQmC rrar DP YCNFW * 
+ +++++ + FW 

571 QPAVGRALKAHR — LQFW 
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DKFZphtes3_35pl7 



group: testes derived 

DKFZphtes3_35pl7 encodes a novel 505 amino acid protein with weak similarity to 
Proteins of the armadillo family. 

Proteins of the armadillo family are involved in diverse cellular processes in higher 
eukaryotes. Some of them, like armadillo, beta-catenin and plakoglobins have dual functions in 
intercellular junctions and signalling cascades. Others, belonging to the importin-alpha- 
subfamily are involved in NLS recognition and nuclear transport, while some members of the 
armadillo family have as yet unknown functions. The novel protein shows similarity to S. 
cerevisiae protein Yel013p (VAC8) and Danio rerio b-catenin, but contains no armadillo (arm) 
repeats . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes.' 



similarity to S. cerevisiae VAC 8 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1966 bp 

Poly A stretch at pos . 1956, polyadenylation signal at pos . 1935 



1 AAGTCAAATG TAAGATTGGT TCATTAAAAA TACTGAAGGA AATCAGTCAT 
51 AATCCTCAAA TCAGACAGAA TATTGTTGAC CTTGGGGGCT TACCAATTAT 
101 GGTGAATATA CTTGATTCTC CACACAAGAG TCTAAAATGT TTGGCAGCCG 
151 AGACTATCGC GAATGTTGCC AAGTTTAAAA GAGCACGGCG GGTGGTGAGG 
201 CAGCACGGGG GTATCACCAA ACTGGTTGCT CT AC TAG ACT GTGCACATGA 
251 TTCCACAAAA CCTGCCCAAT CGAGTCTGTA TGAGGCCAGA GACGTGGAAG 
301 TGGCTCGCTG TGGGGCACTG GCCCTGTGGA GCTGCAGTAA GAGTCATACG 
351 AATAAAGAAG CCATCCGCAA AGCTGGGGGC ATTCCTCTGT TGGCTCGGCT 
401 GCTGAAGACT TCTCATGAAA ACATGCTAAT TCCAGTGGTG GGGACATTGC 
4 51 AAGAGTGTGC ATCAGAGGAA AACTACCGGG CTGCAATCAA AGCAGAAAGG 
501 ATCATTGAAA ACCTTGTCAA GAACCTAAAT AGTGAGAATG AGCAGCTGCA 
551 GGAGCACTGC GCCATGGCCA TTTACCAGTG TGCTGAAGAT AAGGAAACCC 
601 GGGACCTCGT TAGGCTGCAC GGAGGACTTA AGCCCTTGGC CAGTCTACTC 
651 AATAACACTG ACAATAAAGA GCGGTTAGCT GCTGTCACAG GGGCTATATG 
701 GAAATGTTCC ATCAGCAAAG AGAATGTTAC CAAGTTTCGG GAATACAAAG 
751 CCATTGAAAC CTTGGTGGGA CTTCTAACAG ATCAGCCTGA AGAAGTACTT 
801 GTGAATGTGG TTGGGGCCTT GGGAGAATGC TGCCAAGAAC GTGAAAACCG 
851 AGTCATTGTC CGGAAATGTG GTGGCATTCA ACCACTTGTG AACCTCCTTG 
901 TTGGAATAAA CCAAGCTCTT CTTGTGAATG TTACAAAAGC AGTTGGTGCT 
951 TGTGCAGTAG AACCTGAAAG TATGATGATA ATTGATCGCT TAGATGGAGT 
1001 TCGTTTGTTG TGGTCCCTGC TGAAAAATCC TCACCCAGAC GTGAAGGCCA 
1051 GCGCAGCATG GGCACTCTGT CCATGCATCA AAAATGCAAA GGATGCTGGG 
1101 GAAATGGTTC GTTCCTTTGT TGGTGGTTTG GAACTTATTG TCAATTTACT 
1151 GAAATCAGAT AACAAAGAAG TTCTGGCAAG TGTATGTGCT GCCATTACCA 
1201 ACATAGCAAA AGATCAAGAA AATTTAGCTG TTATCACAGA TCATGGAGTT 
1251 GTTCCTTTAT TGTCCAAACT GGCAAATACA AATAACAATA AATTGAGACA 
1301 TCATCTAGCA GAAGCTATTT CACGTTGCTG TATGTGGGGC AGGAATAGAG 
1351 TGGCCTTCGG TGAGCACAAA GCAGTGGCTC CACTAGTGCG TTATCTGAAA 
1401 TCAAATGACA CCAACGTGCA TCGGGCGACA GCTCAGGCCT TGTACCAACT 
1451 CTCAGAAGAC GCCGATAACT GCATCACCAT GCATGAGAAT GGTGCAGTAA 
1501 AGCTTCTACT GGATATGGTT GGGTCCCCTG ACCAGGATCT CCAGGAAGCT 
1551 GCAGCTGGTT GTATATCCAA TATCCGCAGG CTGGCTCTTG CTACAGAGAA 
1601 GGCAAGATAC ACTTGAAATT TAAATGGACA TTACAAGCTA TCAAATTCTA 
1651 CATGACACAG GACATGTCAC TCCCATGGCC AGAAAGCCTA AATTGGGAAA 
1701 CAGTTGTTAG CAAACCCTTT CAACCATCTA AATGAAAACA CACAAATTGA 
1751 AAATGCACAG AATGTTTTTC ATCTGAAAAT TGCATGGAGA CTTTTGTTTC 
1801 TATTTAATGT TTTCGAGATA TGACATGTGA TAAGATGGAA AGCCAATAAA 
1851 CCTGTGATAA GTTTCTAAGA ATATGAGAAT ATACGTATAT GATGTATTTT 
1901 TAGTTCAGTG ATGCTTTTGT ATTTGTGGCG ATTTTAATAA AGGATATGGC 
1951 CTTCCCAAAA AAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98413148: 

Yel013p (Vac8p), an armadillo repeat protein related to plakoglobin and 
importin alpha is associated with the yeast 
vacuole membrane. 

98330438: 

YEB3/VAC8 encodes a myristylated armadillo protein of the Saccharomyces 

cerevisiae vacuolar membrane that 

functions in vacuole fusion and inheritance. 

98158703: 

Vac8p, a vacuolar protein with armadillo repeats, functions in both 
vacuole inheritance and protein targeting from the 
cytoplasm to vacuole. 



Peptide information for frame 3 



ORF from 99 bp to 1613 bp? peptide length: 505 
Category: similarity to known protein 
Classification: unset 



1 MVNILDSPHK SLKCLAAETI ANVAKFKRAR RWRQHGGIT KLVALLDCAH 
51 DSTKPAQSSL YEARDVEVAR CGALALWSCS KSHTNKEAIR KAGGIPLLAR 
101 LLKTSHENML IPVVGTLQEC ASEENYRAAI KAERIIENLV KNLNSENEQL 
151 QEHCAMAIYQ CAEDKETRDL VRLHGGLKPL ASLLNNTDNK ERLAAVTGAI 
201 WKCSISKENV TKFREYKAIE TLVGLLTDQP EEVLVNVVGA LGECCQEREN 
251 RVIVRKCGGI QPLVNLLVGI NQALLVNVTK AVGACAVEPE SMMIIDRLDG 
301 VRLLWSLLKN PHPDVKASAA WALCPCIKNA KDAGEMVRSF VGGLELIVNL 
351 LKSDNKEVLA SVCAAITNIA KDQENLAVIT DHGVVPLLSK LANTNNNKLR 
401 HHLAEAISRC CMWGRNRVAF GEHKAVAPLV RYLKSNDTNV HRATAQALYQ 
451 LSEDADNCIT MHENGAVKLL LDMVGSPDQD LQEAAAGCIS NIRRLALATE 
501 KARYT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35pl7, frame 3 

PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae), N « 1, 
Score » 237, P « 7.8e-17 

PIR:T00403 T13E15.9 protein - Arabidopsis thaliana, N - 1, Score * 215, 
P - 4.9e-14 

TREMBL: DR41081 1 product: "b-catenin"; Danio rerio b-catenin mRNA, 

complete cds.,~N = 1, Score * 195, P = 5.8e-12 



>PIR:S50446 VAC8 protein - yeast (Saccharomyces cerevisiae) 
Length - 57 8 

HSPs: 

Score = 237 (35.6 bits), Expect = 7.8e-17, P = 7.8e-17 
Identities = 106/401 (26%), Positives - 177/401 (44%) 

92 AGGIPLLARLLKTSHENMLIPWGTLQECASEENYRAAIKAERIIENLVKNLNSENEQLQ 151 

+GG PL A +N+ + L E Y " + E ++E ++ L S++ Q+Q 

45 SGG-PLKALTTLVYSDNLNLQRSAALAFAEITEKYVRQVSRE-VLEPILILLQSQDPQIQ 102 

152 EHCAMAIYQCAEDKETRDLVRLHGGLKPliASLLNNTDNKERLAAVTGAIWKCSISKENVT 211 

A+ A + E + L+ GGL+PL + + DN E G I + +N 

103 V AAC AA LGN LA VNNEN K LL I VEMGGL E P L I NQMMG - DNVEVQCN A VGC I T N LAT RDDN KH 161 

212 KFREYKAIETLVGLLTDQPEEVLVNWGALGECCQERENRVIVRKCGGIQPLVNLLVGIN 271 

K A+ L L + V N GAL ENR + G + LV+LL + 

162 K I AT SG AL I PLT KLAK S KH I RVQRN ATG AL LNMT H S EENRKEL VN AG AV P VLV S LL S ST D 221 

272 QALL VNVTKAVGACAVEPESMMI I DRLDG — VRLLWSLLKNPH PDVKASAAWALC PC I KN 329 
+ T A+ AV+ + + + + V L SL+ +P VK A AL + 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
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Sbjct: 222 PDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASD 281 

Query: 330 AKDAGEMVRSFVGGLELIVNLLKSDNKE-VLASVCAAITNIAKDQENLAVITDHGVV-PL 387 

E+VR+ GGL +V L++SD+ VLASV A I NI + N +1 D G + PL 
Sbjct: 282 TSYQLEIVRA-- GGLPHLVKLIQSDSIPLVLASV-ACIRNISIHPLNEGLIVDAGFLKPL 338 

Query: 388 LSKLANTNNNKLRHHLAEAISRCCMWG-RNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQ 446 

+ L ++ +++ H + +NR F E AV + +V + + 

Sbjct: 339 VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSV-QSEIS 397 

Query: 4 47 ALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCISNI 492 

A + + AD+++E +L+MS +Q++ AA ++N+ 
Sbjct: 398 ACFAILALADVSKLDLLEANILDALIPMTFSQNQEVSGNAAAALANL 444 

Score - 213 (32.0 bits). Expect » 3.6e-14, P = 3.6e-14 
Identities - 81/341 (23%), Positives - 163/341 (47%) 

Query: 163 EDKETRDLVRLHGGLKPLASLLNNTD-NKERLAAVTGAIWKCSISKENVTKFREYKAIET 221 

EDK+ D G LK L +L+ + + N +R AA+ A I+++ V + + +E 

Sbjct: 36 EDKDQLDFYS-GGPLKALTTLVYSDNLNLQRSAALAFA EITEKYVRQVSR-EVLEP 89 

Query: 222 LVGLLTDQPEEVLVNWGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKA 281 

++ LL Q ++ V ALG EN++++ + GG++PL+N ++G N + N 

Sbjct: 90 ILILLQSQDPQIQVAACAALGNLAVNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGC 149 

Query: 282 V G AC AVEPESMMIIDRL DGV RL LW S LL KN P H P D V K AS A AW ALC PCI KN AK DAGEMV RS FV 341 

+ A ++ I + L L K+ H V> +A AL + ++ E+V + 

Sbjct: 150 ITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA— 207 

Query: 342 GGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVI--TDHGWPLLSKLANTNNNKL 399 

G + ++V+LL S + +V A++NIA D+ N + T+ +V L L ++ ++++ 

Sbjct: 208 GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 267 

Query: 400 RHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCI 459 

+ A+ ++ ♦ LV+ ++S+ + A+ + +S N 

Sbjct: .268 KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 327 

Query: 460 TMHENGAVKLLLDMVGS PDQDLQEAAAGC I SNI RRLALATEKAR 503 

"¥ ♦ + G +K L+ ++ D + E +S +R LA ++EK R 

Sbjct: 328 LIVDAGFLKPLVRLLDYKDSE— EIQCHAVSTLRNLAASSEKNR 369 

Score - 180 (27.0 bits), Expect » 1.6e-10, P - 1.6e-l0 
Identities - 80/346 (23%), Positives « 142/346 (41%) 

Query: 145 SENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCS 204 

S+N LQ A+A + E K R + R L+P+ LL + D + ++AA A+ + 
Sbjct: 58 SDNLNLQRSAALAFAEITE-KYVRQVSR — EVLEPILILLQSQDPQIQVAACA-ALGNLA 113 

Query: 205 ISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLV 264 

++ EN E +E L+ + EV N VG + +N+ + G + PL 

Sbjct: 114 VNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGCITNLATRDDNKHKIATSGALIPLT 173 

Query: 265 NLLVGINQALLVNVTKAVGACAVEPESMMI I DRLDGVRLLWSLLKNPHPDVKASAAWALC 324 

L + + N T A+ E+ + V +L SLL + PDV+ AL 

Sbjct: 174 KLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTALS 233 

Query: 325 PCIKNAKDAGEMVRSFVGGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVITDHGV 384 

++++++ + +V+L+ S + V A+ N+A D I G 

Sbjct: 234 NIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRAGG 293 

Query: 385 VPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRAT 444 

+ p L KL *++ L I + N + + PLVR L D+ + 

Sbjct: 294 LPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEIQCH 353 

Query: 445 A-QALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCIS 490 

A L L+ ++ N E+GAV+ ++ +Q + C + 

Sbjct: 354 AVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFA 401 

Score - 155 (23.3 bits), Expect =• 8.8e-08, P » 8.8e-08 
Identities - 88/401 (21%), Positives = 175/401 (43%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



60 LYEARD — VEVARCGALALWSCSKSHTNKEAIRKAGGI-PLLARLLKTSHENMLIPVVGT 116 

L +++D ++VA C AL + + ++ NK I + GG+ PL+ +++ + E + VG 
93 LLQSQDPQIQVAACAALG — NLAVNNENKLLIVEMGGLEPLINQMMGDNVE-VQCNAVGC 149 

117 LQECASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLHG 175 

+ A+ ++ + I + L K S++ ++Q + A+ +E R +LV G 

150 ITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA-G 208 



176 GLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFR- 
+ L SLL++TD + T A+ ++ *■ N K 



-EYKAI ETLVGLLTDQPEEV 233 
E + + LV L+ V 
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Sbjct: 


209 


Query: 


234 


Sbjct: 


268 


Query: 


294 


Sbjct: 


328 


Query: 


352 


Sbjct: 


386 


Query: 


410 


Sbjct: 


444 


Score 


= 139 



209 AVPVLVSLLSSTDPDVQYYCTT-ALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 267 



AL 



++ + + GG+ LV L+ 



++ P + 



+1 



++ L LL 



++ K+ E S G +E 



E VL A - - S VC AA I T N I AKDQEN LAV I T DH G VV P LL S KL ANTNNN KLRH HLAEA I S R 409 
V + SCAI +AD L+++ ++ L + +N++ + A A++ 



-HKAVAP-LVRYLKSNDTNVHRATAQALYQLSE 4 53 
++ + L+R+LKS+ + QL E 



Identities = 80/329 (24%), Positives 



-06, P = 5.0e-06 
142/329 (43%) 



37 GGITKLVALLDCAHD-STKPAQ— SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKA 92 
G IT L DH+TA +L +++ + V R AL + + S N++ + A 
148 GCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA 207 

93 GGIPLLARLLKTSHENMLIPVVGTLQECASEE-NYRAAIKAE-RIIENLVKNLNSENEQL 150 
G +P+L LL ++ ++ L A +E N + + E R++ LV ++S + ++ 

208 GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 267 

151 QEHCAMAI YQCAEDKETR-DLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCSISKEN 209 

+ +A+ AD + ++VR GGL L L+ + D+ + A I SI N 
268 KCQATLALRNLASDTSYQLEIVRA-GGLPHLVKLIQS-DSIPLVLASVACIRNISIHPLN 325 

210 VTKFREYKAIETLVGLLT-DQPEEVLVNVVGALGECCQERE-NRVIVRKCGGIQPLVNLL 267 

♦ ++ LV LL EE+ + V L ENR +G++ L 

326 EGLIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELA 385 

268 VG— INQALLVNVTKAVGACA-VEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-L 323 

+ + + ++ A+ A A V ++ + LD + + + +N A+AA A L 

386 L DS PVSVQS E IS AC FAI LALADVSKLDLLE AN I LDAL- 1 PMTFSQNQEVSGN AAAAL ANL 444 



Query: 
Sbjct: 
Query : 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Query: 324 C PC I KN- AKDAGEMVRS FVGGLELI VNLLKSD 354 

C + N K R G ++ LKSD 

Sbjct: 4 45 CSRVNKYTKI IEAWDRPNEGI RGFLI RFLKSD 47 6 

Score = 136 (20.4 bits), Expect ~ l.le-05, P = l.le-05 
Identities = 72/304 (23%), Positives - 133/304 (43%) 

Query: 58 SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPWGTL 117 

+ L +++ + V R AL + + S N++ + AG +P+L LL ++ ++ L 
Sbjct: 173 TKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTAL 232 

Query: 118 QECASEE-NYRAAIKAE-RIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLH 174 

A +E N + + E R++ LV ++S + +++ +A+ AD + ++VR 
Sbjct: 233 SNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRA- 291 

Query: 175 GGLKPLASLLNNTDNKERLAAVTGAIWKCS I SKENVTKFREYKAI ETLVGLLT-DQPEEV 233 
" GGL L L+ + D+ + A I SI N + +♦ LV LL EE+ 

Sbjct: 292 GGLPHLVKLIQS-DSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEI 350 

Query: 234 LVNWGALGECCQERE-NRVIVRKCGGIQPLVNLLVG— INQALLVNVTKAVGACA-VEP 289 

+ V L E NR + G ++ L + ++ ++ A+ A A V 

Sbjct: 351 QCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDS PVSVQS EI SACFAILALADVSK 410 

Query: 290 ESMMI I DRLDGVRLLWS LLKN PHPDVKAS AAWA- LC PCI KN -AKDAGEMVRS FVGGLELI 347 

++ + LD + + + +N A+AA A LC + N K RG + 

Sbjct: 411 LDLLEANILDAL-IPMTFSQNQEVSGNAAAALANLCSRVNNYTKIIEAWDRPNEGIRGFL 469 

Query: 348 VNLLKSD 354 

+ LKSD 
Sbjct: 470 I RFLKSD 476 



Score 53 114 
Identities 



(17.1 bits), Expect » 2.7e-03, P = 2.7e-03 
• 71/335 (21%), Positives - 132/335 (39%) 



Query: 
Sbjct: 
Query: 



1 MVNILDSPHKSLKCLAAETIANVAKFKRARRWRQHGGITKLVALLDCAHDSTKPAQSSL 60 
♦ + SH++A +N+ +R++ G + LV+LL ST P 

172 LTKLAKS KH I RVQRN ATG ALLNMT HSEENRKE L VN AGA VP VLV S L LS 



--STDP- 



222 



61 YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPWGTLQEC 120 
DV+ AL* + +++ KA++LL++ + L+ 



864 



WO 01/12659 
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Sbjct: 


223 


Query: 


121 


Sbjct: 


279 


Query: 


181 


Sbjct: 


339 


Query: 


240 


Sbjct: 


399 


Query: 


299 


Sbjct: 


459 


Score 


= 106 



223 DVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNL 27B 



AS+ +Y+ 



LL+ D++E 



+ +LVK + S++ L 



L+ G LKPL 



+ S E N +F E A+E LDP 



+++ + + 



G+R L LK+ + 



L+ + 



NQ + N A+ C+ 



11 + 



A W + +++ D E 



Identities * 49/204 (24%), Positives 



-02, P - 2.0e-02 
89/204 (43%) 



Query: 65 DVEVARCGALA-LWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPVVGTLQECA-S 122 

+VEV +C A+ + + + NK I +G + L +L K+ H + G L S 

Sbjct: 139 NVEV-QCNAVGCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHS 197 

Query: 123 EENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRD-LVRLHGGL-KPL 180 

EEN + + A + LV L+S + +Q +C A+ A D+ R L + L L 
Sbjct: 198 EENRKELVNAGAV-PVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKL 256 

Query: 181 ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNWGA 240 

SL+++ ++ + A T A+ + + + LV L+ +++ V 

Sbjct: 257 VSLMDSPSSRVKCQA-TLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVAC 315 

Query: 241 LGECCQERENRVIVRKCGGIQPLVNLL 267 

+ N ++ G ++PLV LL 

Sbjct: 316 IRNISIHPLNEGLIVDAGFLKPLVRLL 342 



Pedant information for DKFZphtes3_35pl7, frame 3 



Report for DKFZphtes3_35pl7 . 3 



[LENGTH] 505 

[MW] 55224.34 

(pi) 8.43 

[HOMOL] PIR:S50446 VAC8 protein - yeast (Saccharomyces cerevisiae) 2e-16 

(FUNCAT) 30.25 vacuolar and lysosomal organization {S. cerevisiae, YEL013wj 8e-18 

[FUNCAT) 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 

8e-18 

[ FUNCAT) 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YEL013w) 8e-18 

( FUNCAT } 08.01 nuclear transport [S. cerevisiae, YNL189w] 3e-06 

( FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YNLl89w) 3e-06 

[FUNCAT J 30.10 nuclear organization (S. cerevisiae, YNL189w] 3e-06 

[BLOCKS J BL01265C 

( BLOCKS] BL00242A Integrins alpha chain proteins 

(SCOP] d3bct 1.91.1.1.1 beta-Catenin (Mouse (Mus musculus) 7e-18 

(PIRKW] cytosol 3e-ll 

(PIRKW] apoptosis 3e-ll 

(PIRKW] carcinogenesis 3e-ll 

( PIRKW] cell adhesion 3e-ll 

( PIRKW] cytoskeleton 3e-12 

(SUPFAM] pendulin le-07 
All Alpha 
3D " 

LOW COMPLEXITY 2 . 38 % 



MVNILDSPHKSLKCLAAETIANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 

xxxxxxxxxxxx 

HH 

YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPWGTLQEC 



HHCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCHHHHHHHGGGCCCHHHHHHHHHHHHHH 
ASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 
HHTTTHHHHHHHHCHHHHHHHHHCCCCH 



[KW] 
[KW] 
(KW) 



SEQ 
SEG 
2bct- 

SEQ 
SEG 
2bct- 

SEQ 
SEG 
2bct- 



865 



WO 01/12659 PCT/IB00/01496 

SEQ ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 

SEG 

2bct- HHHHH-HCCCHHHHHHHHHHHHHHCCCHHHHHHHHHCHHHHHHTTTTTCCHHHHHHHHHH 

SEQ LGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMMIIDRLDG 

SEG 

2bct- H HHHHHCCCCTTTHHHHHHHHHHHHCTTTHHHHHHHHHTTTHHHHHHHH-HHCH 

SEQ VRL LWS LLKN P H P DV KAS AAW ALC PCI KN A K DAG EMVRS FVGG LE L I VN LLKS DN K E V LA 

SEG 

2bct- HHHHHHHHHTTTHHHHHHHHHHHHHHHCCCCHH-HHHHHHHHHHHHHHHHCTTTTTHHHH 

SEQ SVCAAI TN I AKDQENLAVITDHG VVPLLSKLANTNNNKLRHHLAEA I S RCCMWGRNRVAF 

SEG 

2bct- HHHHHHHHHHHCGGGHHHHHHHCHHHHHHHHHHHHHHTTTCCHHHHHHHHHHHHCHHHHH 

SEQ GEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCITMHENGAVKLLLDMVGSPDQD 

SEG 

2bct- HTTTHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHCCHHHHHHHTTTTTTHH 

SEQ LQEAAAGCI SN I RRLALATEKARYT 

SEG 

2bCt- HHHHHHHHH 

(No Prosite data available for DKFZphtes3_35pl7 . 3) 
(No Pfam data available for DKFZphtes3_35pl7 . 3) 
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DKFZphtes3_35p22 



group: cell cycle 

DKFZphtes3_35p22 encodes a novel 549 amino acid protein, with similarity to oncogene 1 (tre-2 
locus) . 

The novel protein is closely raleted to human tre-2 and other enzymes involved in the 
degradation of ubiquitinated proteins. The human tre-2 oncogene encodes a deubiquitinating 
enzyme, indicating a role for the ubiquitin system in mammalian growth control. 

The novel protein can find application in cancer diagnostics and treatment, and in regulating 
protein stability and growth control via regulation of ubiquitination. 



strong similarity to oncogene 1 (tre-2 locus) 
membrane regions: 1 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: map="17" 
Insert length: 2072 bp 

Poly A stretch at pos. 2062, polyadenylation signal at pos. 2039 



1 GTTACACACA GGCAGTGGTA TCTGTGAGCA GCTCTGTGGA CTCAAAGGTT 
51 TTCTCCCTGA GAGGCATGAC CCAGGCCAGC TGATTCATCA GAATCAGGAT 
101 GGACGTGGTA GAGGTCGCGG GCAGTTGGTG GGCACAAGAG CGAGAGGACA 
151 TCATTATGAA ATACGAAAAG GGACACCGAG CTGGGCTGCC AGAGGACAAG 
201 GGGCCTAAGC CTTTTCGAAG CTACAACAAC AACGTCGATC ATTTGGGGAT 
251 TGTACATGAG ACGGAGCTGC CTCCTCTGAC TGCGCGGGAG GCGAAGCAAA 
301 TTCGGCGGGA GATCAGCCGA AAGAGCAAGT GGGTGGATAT GCTGGGAGAC 
351 TGGGAGAAAT ACAAAAGCAG CAGAAAGCTC ATAGATCGAG CGTACAAGGG 
401 AATGCCCATG AACATCCGGG GCCCGATGTG GTCAGTCCTC CTGAACACTG 
4 51 AGGAAATGAA GTTGAAAAAC CCCGGAAGAT * ACCAGATCAT GAAGGAGAAG 
501 GGCAAGAAGT CATCTGAGCA CATCCAGCGC ATCGACCGGG ACGTAAGCGG 
551 GACATTAAGG AAGCATATAT TCTTCAGGGA TCGATACGGA ACCAAGCAGC 
601 .GGGAACTACT CCACATCCTC CTGGCATATG AGGAGTACAA CCCGGAGGTG 
651 GGCTACTGCA GGGACCTGAG CCACATCGCC GCCTTGTTCC TCCTCTATCT 
701 TCCTGAGGAG GATGCATTCT GGGCACTGGT GCAGCTGCTG GCCAGTGAGA 
751 GGCACTCCCT GCAGGGATTT CACAGCCCAA ATGGCGGGAC CGTCCAGGGG 
801 CTCCAAGACC AACAGGAGCA TGTGGTAGCC ACGTCACAAC CCAAGACCAT 
8 51 GGGGCATCAG GACAAGAAAG ATCTATGTGG GCAGTGTTCC CCGTTAGGCT 
901 GCCTCATCCG GATATTGATT GACGGGATCT CTCTCGGGCT CACCCTGCGC 
951 CTGTGGGACG TGTATCTGGT AGAAGGCGAA CAGGCGCTGA TGCCGATAAC 
1001 AAGAATCGCC TTTAAGGTTC AGCAGAAGCG CCTCACGAAG ACGTCCAGGT 
1051 GTGGCCCGTG GGCACGTTTT TGCAACCGGT TCGTTGATAC CTGGGCCAGG 
1101 GATGAGGACA CTGTGCTCAA GCATCTTAGG GCCTCTATGA AGAAACTAAC 
11S1 AAGAAAGAAG GGGGACCTGC CACCCCCAGC CAAACCCGAG CAAGGGTCGT 
1201 CGGCATCCAG GCCTGTGCCG GCTTCACGTG GCGGGAAGAC CCTCTGCAAG 
12 51 GGGGACAGGC AGGCCCCTCC AGGCCCACCA GCCCGGTTCC CGCGGCCCAT 
1301 TTGGTCAGCT TCCCCGCCAC GGGCACCTCG TTCTTCCACA CCCTGTCCTG 
1351 GTGGGGCTGT CCGGGAAGAC ACCTACCCTG TGGGCACTCA GGGTGTGCCC 
1401 AGCCCGGCCC TGGCTCAGGG AGGACCTCAG GGTTCCTGGA GATTCCTGCA 
1451 GTGGAACTCC ATGCCCCGCC TCCCAACGGA CCTGGACGTA GAGGGCCCTT 
1501 GGTTCCGCCA TTATGATTTC AGACAGAGCT GCTGGGTCCG TGCCATATCC 
15 51 CAGGAGGACC AGCTGGCCCC CTGCTGGCAG GCTGAACACC CTGCGGAGCG 
1601 GGTGAGATCG GCTTTCGCTG CACCCAGCAC TGATTCCGAC CAGGGCACCC 
1651 CCTTCAGAGC TAGGGACGAA CAGCAGTGTG CTCCCACCTC AGGGCCTTGC 
1701 CTCTGCGGCC TCCACTTGGA AAGTTCTCAG TTCCCTCCAG GCTTCTAGAA 
1751 GCATCTGGGC CAGGGCTCAT GGCTGGATAA TTTCCCTAGG CTTAACAACC 
1801 CAAGCAAGCT TCGCATCCTC GTTTTATTTT TGGTTAAACT TATGAAAATG 
1851 TATTAAGAAA GAGTGCAGCT CGAGAGAGAT TCAGAGATGG AACACACCAG 
1901 ACCCCAGATC ACAAAGCCAA CCATGCCCAG CCCCTCCCAG CACCCCCAGC 
1951 CCCACGACCA TCGTTCTGAA TTCTGACGAC ACCGTGAGCC TGCCTTTGTA 
2001 CTTCAAACTC ATGGAAGGAT AACCACCTTC ATGTTTTGAA ATAAATGTTT 
2051 CCTGTTGAAA TGAAAAAAAA AA 



BLAST Results 



Entry AC00397 6 from database EMBL: 

Homo sapiens chromosome 17, clone hCIT.91_J_4, complete sequence. 
Score - 4385, P = 0.0e+00, identities - 881/886 
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14 exons 

Entry HSG19723 from database EMBL: 
human STS A001W35. 
Score = 850, P « 1.9e-32, identities - 170/170 



Medline entries 



92228503: 

A novel transcriptional unit of the tre oncogene widely 
expressed in human cancer cells. 

94067315: 

The yeast DOA4 gene encodes a deubiquitinating enzyme 
related to a product of the human tre-2 oncogene. 

95176708: 

UBP5 encodes a putative yeast ubiquitin-specif ic protease 
that is related to the human Tre-2 oncogene product. 



Peptide information for frame 3 



ORF from 99 bp to 1745 bp; peptide length: 549 
Category: strong similarity to known protein 



1 MDWEVAGSW WAQEREDIIM KYEKGHRAGL PEDKGPKPFR SYNNNVDHLG 

51 IVHETELPPL TAREAKQIRR EISRKSKWVD MLGDWEKYKS SRKLIDRAYK 

101 GMPMNXRGPM WSVLLNTEEM KLKNPGRYQI MKEKGKKSSE HIQRIDRDVS 

151 GTLRKHIFFR DRYGTKQREL LHILLAYEEY NPEVGYCRDL SHIAALFLLY 

201 LPEEDAFWAL VQLLASERHS LQGFHSPNGG TVQGLQDQQE HVVATSQPKT 

251 MGHQDKKDLC GQCSPLGCLI RILIDGISLG LTLRLWDVYL VEGEQALMPI 

301 TRIAFKVQQK RLTKTSRCGP WARFCNRFVD TWARDEDTVL KHLRASMKKL 

351 TRKKGDLPPP AKPEQGSSAS RPVPASRGGK TLCKGDRQAP PGPPARFPRP 

401 IWSASPPRAP RSSTPCPGGA VREDTYPVGT QGVPSPALAQ GGPQGSWRFL 

451 QWNSMPRLPT DLDVEGPWFR HYDFRQSCWV RAISQEDQLA PCWQAEHPAE 

501 RVRSAFAAPS TDSDQGTPFR ARDEQQCAPT SGPCLCGLHL ESSQFPPGF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35p22, frame 3 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human, N * 1, Score - 
2181, P = 5.5e-226 

PIR:S57867 oncogene 1 - human, N - 1, Score = 1536, P = 1.2e-157 



>PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 
Length = 786 

HSPs: 

Score - 2181 (327.2 bits), Expect * 5.5e-226, P - 5.5e-226 
Identities « 405/500 (81%), Positives - 440/500 (88%) 

MDVVEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 
MD+VE A S AQER+DI+MKY+KGHRAGLPEDKGP+P N+++D GI+HETELPP+ 
MDMVENADSLQAQERKDILMKYDKGHRAGLPEDKGPEPV-GINSSIDRFGILHETELPPV 

TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMPMNIRGPMWSVLLNTEEM 
TAREAK+IRRE++R SKW++MLG+WE YK S KLIOR YKG+PMNIRGP+WSVLLN +E+ 
TAREAKKIRREMTRTSKWMEMLGEWETYKHSSKLIDRVYKGIPMNIRGPVWSVLLNIQEI 

KLKNPGRYQIMKEKGKKSSEHIQRIDROVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 
KLKNPGRYQIMKE+GK+SSEHI ID DV TLR H+FFRDRYG KQREL +ILLAY EY 
KLKNPGRYQIMKERGKRSSEHIHHIDLDVRTTLRNHVFFRDRYGAKQRELFYILLAYSEY 

NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 
NPEVGYCRDLSHI ALFLLYLPEEDAFWALVQLLASERHSL GFHSPNGGTVQGLQDQQE 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


60 


Query: 


121 


Sbjct: 


120 


Query: 


181 


Sbjct: 


180 



868 



WO 01/12659 



PCT/IB00/01496 



Query: 241 HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRI LI DGI SLGLTLRLWDVYLVEGEQALMPI 300 

HVV SQPKTM HQDK+ LCGQC+ LGCL+R LIDGISLGLTLRLWDVYLVEGEQ LMPI 
Sbjct: 240 H W PKSQP KTHWHQDKEGLCGQC AS LGCLLRNL I DGI S LG LT LRL W D VY L VEGEQVLMP I 299 

Query: 301 TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 360 

T IA KVQQKRL KTSRCG WAR N+F DTWA ++DTVLKHLRAS KKLTRK+GDLPPP 
Sbjct: 300 TSIALKVQQKRLMKTSRCGLWARLRNQFFDTWAMNDDTVLKHLRASTKKLTRKQGDLPPP 359 

Query: 361 AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 420 

AK EQGS A RPVPASRGGKTLCKG RQAPPGPPA+F RPI SASPP A R STPCPGGA 
Sbjct: 360 AKREQGSLAPRPVPASRGGKTLCKGYRQAPPGPPAQFQRPICSASPPWASRFSTPCPGGA 419 

Query: 421 VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 480 

VREDTYPVGTQGVPS ALAQGG PQGS WRFL+W SMPRLPTDLD+ GPWF HYDF +SCWV 
Sbjct: 420 VREDTYPVGTQGVPSLALAQGGPQGSWRFLEWKSMPRLPTDLDIGGPWFPHYDFERSCWV 479 

Query: 481 RAISQEDQLAPCWQAEHPAE 500 

RAISQEDQLA CWQAEH E 
Sbjct: 480 RAI SQEDQLATCWQAEHCGE 499 

Pedant information for DKFZphtes3_35p22 , frame 3 

Report for DKFZphtes3_35p22 . 3 



(LENGTH] 549 

[MW] 62159.16 

[pi] 9.23 

[HOMOL] PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 0.0 

(FUNCAT] 11.01 stress response IS. cerevisiae, YGRlOOw) 2e-16 

[ FUN CAT ] 04.05.01-04 transcriptional control [S. cerevisiae, YGRlOOw) 2e-16 

( FUNCAT } 99 unclassified proteins [S. cerevisiae, YNL293w) 3e-15 

[PIRKW] transmembrane protein 6e-14 

[PROSITE] MYRISTYL 6 

IPROSITE] AMIDATION 1 

(PROSITE] CAMP PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO SITE 4 

(PROSITE] TYR_PHOSPHO_SIT£ 2 

( PROSITE] PKC PHOSPHORITE 10 

(KW) TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 5.28 % 



SEQ MDVVEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 

SEG 

PRD ccceeeccchhhhhhhhhhhhhhccccccccccccccceeeeeccccccccccccccccc 

MEM 

SEQ TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMPMNIRGPMWSVLLNTEEM 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccccccccceeeccccccc 

MEM 

SEQ KLKNPGRYQIMKEKGKKSSEHIQRIDRDVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhc 

MEM 

SEQ NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhh 

MEM 

SEQ HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGI SLGLTLRLWDVYLVEGEQALMPI 

SEG • 

PRD hhhhhhhchhhhhhhhccccccccchhhhhhhhhhccccchhhhhhhhhccccceeeehh 

MEM MMMMMMMMMMMMMMMMMM 

SEQ TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 

SEG 

PRD hhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD ccccccccccccccccccceeeeccccccccccccccccccccccccccccccccccccc 

MEM 
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SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 
cccccccccccccccccccccccccceeeeeccccccccccccccccccccccccccccc 

RAISQEDQLAPCWQAEHPAERVRSAFAAPSTDSDQGTPFRARDEQQCAPTSGPCLCGLHL 
cchhhhhhhhhhhhhhcchhhhhhhhccccccccccccccchhhhhcccccccccceeee 

ESSQFPPGF 
ccccccccc 



Prosite for DKFZphtes3_35p22 . 3 



PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 



136->140 
310->314 
348->352 
61->64 
73->76 
90->93 
152->155 
216->219 
282->285 
315->318 
346->349 
351->354 
446->449 
61->65 
460->464 
484->488 
5U->515 
93->100 
92->100 
8->14 
101->107 
230->236 
276->282 
366->372 
441->447 
134->138 



CAMP PHOSPHO_SITE 

CAMP~PHOSPHO_SITE 

CAMP PHOSPHO SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPHCTSITE 

PKC PHOSPHORS I TE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

P KC_PHOS PHO_S I T E 

CK2 PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 



PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphtes3_35p22 . 3) 
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DKFZphtes3_4b4 



group: testes derived 

DKFZphtes3 4b4 encodes a novel 49*7 amino acid protein similar to SCP proteins and a human 
trypsin inhibitor. 

The novel protein contains an extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 2, 
predicted by Prosite and Pfam. This domain is found in a variety of extracellular proteins 
from eukaryotes that have been found to be evolutionary related. The exact function of these 
proteins is not yet known. In addition, the protein is similar to a human trypsin inhibitor. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes or as a new protease inhibitor. 



strong similarity to trypsin inhibitor 
might be a new protease inhibitor? 
Sequenced by AGOWA 

Locus: /map»"333.4 cR from top of Chrl6 linkage group" 
Insert length: 4574 bp 

Poly A stretch at pos. 4551, polyadenylation signal at pos . 4539 



1 GGCGGCTGCT CCCATTGAGC TGTCTGCTCG CTGTGCCCGC TGTGCCTGCT 
51 GTGCCCGCGC TGTCGCCGCT GCTACCGCGT CTGCTGGACG CGGGAGACGC 
101 CAGCGAGCTG GTGATTGGAG CCCTGCGGAG AGCTCAAGCG CCCAGCTCTG 
151 CCCGAGGAGC CCAGGCTGCC CCGTGAGTCC CATAGTTGCT GCAGGAGTGG 
201 AGCCATGAGC TGCGTCCTGG GTGGTGTCAT CCCCTTGGGG CTGCTGTTCC 
251 TGGTCTGCGG ATCCCAAGGC TACCTCCTGC CCAACGTCAC TCTCTTAGAG 
301 GAGCTGCTCA GCAAATACCA GCACAACGAG TCTCACTCCC GGGTCCGCAG 
351 AGCCATCCCC AGGGAGGACA AGGAGGAGAT CCTCATGCTG CACAACAAGC 
401 TTCGGGGCCA GGTGCAGCCT CAGGCCTCCA ACATGGAGTA CATGACCTGG 
451 GATGACGAAC TGGAGAAGTC TGCTGCAGCG TGGGCCAGTC AGTGCATCTG 
501 GGAGCACGGG CCCACCAGTC TGCTGGTGTC CATCGGGCAG AACCTGGGCG 
5 51 CTCACTGGGG CAGGTATCGC TCTCCGGGGT TCCATGTGCA GTCCTGGTAT 
601 GACGAGGTGA AGGACTACAC CTACCCCTAC CCGAGCGAGT GCAACCCCTG 
651 GTGTCCAGAG AGGTGCTCGG GGCCTATGTG CACGCACTAC ACACAGATAG 
701 TTTGGGCCAC CACCAACAAG ATCGGTTGTG CTGTGAACAC CTGCCGGAAG 
751 ATGACTGTCT GGGGAGAAGT TTGGGAGAAC GCGGTCTACT TTGTCTGCAA 
801 TTATTCTCCA AAGGGGAACT GGATTGGAGA AGCCCCCTAC AAGAATGGCC 
851 GGCCCTGCTC TGAGTGCCCA CCCAGCTATG GAGGCAGCTG CAGGAACAAC 
901 TTGTGTTACC GAGAAGAAAC CTACACTCCA AAACCTGAAA CGGACGAGAT 
951 GAATGAGGTG GAAACGGCTC CCATTCCTGA AGAAAACCAT GTTTGGCTCC 
1001 AACCGAGGGT GATGAGACCC ACCAAGCCCA AGAAAACCTC TGCGGTCAAC 
1051 TACATGACCC AAGTCGTCAG ATGTGACACC AAGATGAAGG ACAGGTGCAA 
1101 AGGGTCCACG TGTAACAGGT ACCAGTGCCC AGCAGGCTGC CTGAACCACA 
1151 AGGCGAAGAT CTTTGGAACT CTGTTCTATG AAAGCTCGTC TAGCATATGC 
1201 CGCGCCGCCA TCCACTACGG GATCCTGGAT GACAAGGGAG GCCTGGTGGA 
1251 TATCACCAGG AACGGGAAGG TCCCCTTCTT CGTGAAGTCT GAGAGACACG 
1301 GCGTGCAGTC CCTCAGCAAA TACAAACCTT CCAGCTCATT CATGGTGTCA 
1351 AAAGTGAAAG TGCAGGATTT GGACTGCTAC ACGACCGTTG CTCAGCTGTG 
1401 CCCGTTTGAA AAGCCAGCAA CTCACTGCCC AAGAATCCAT TGTCCGGCAC 
14 51 ACTGCAAAGA CGAACCTTCC TACTGGGCTC CGGTGTTTGG AACCAACATC 
1501 TATGCAGATA CCTCAAGCAT CTGCAAGACA GCCGTGCACG CGGGAGTCAT 
1551 CAGCAACGAG AGTGGGGGTG ACGTGGACGT GATGCCCGTG GATAAAAAGA 
1601 AGACCTACGT GGGCTCGCTC AGGAATGGAG TTCAGTCTGA AAGCCTGGGG 
1651 ACTCCTCGGG ATGGAAAGGC CTTCCGGATC TTTGCTGTCA GGCAGTGAAT 
1701 TTCCAGCACC AGGGGAGAAG GGGCGTCTTC AGGAGGGCTT CGGGGTTTTG 
17 51 CTTTTATTTT TATTTTGTCA TTGCGGGGTA TATGGAGAGT CAGGAAACTT 
1801 CCTTTGACTG ATGTTCAGTG TCCATCACTT TGTGGCCTGT GGGTGAGGTG 
1851 ACATCTCATC CCCTCACTGA AGCAACAGCA TCCCAAGGTG CTCAGCCGGA 
1901 CTCCCTGGTG CCTGATCCTG CTGGGGCCCG GGGGTCTCCA TCTGGACGTC 
1951 CTCTCTCCTT TAGAGATCTG AGCTGTCTCT TAAAGGGGAC AGTTGCCCAA 
2001 AATGTTCCTT GCTATGTGTT CTTCTGTTGG TGGAGGAAGT TGATTTCAAC 
2051 CTCCCTGCCA AAAGAACAAA CCATTTGAAG CTCACAATTG TGAAGCATTC 
2101 ACGGCGTCGG AAGAGGCCTT TTGAGCAAGC GCCAATGAGT TTCAGGAATG 
2151 AAGTAGAAGG TAGTTATTTA AAAATAAAAA ACACAGTCCG TCCCTACCAA 
2201 TAGAGGAAAA TGGTTTTAAT GTTTGCTGGT CAGACAGACA AATGGGCTAG 
2251 AGTAAGAGGG CTGCGGGTAT GAGAGACCCC GGCTCCGCCC TGGCACGTGT 
2301 CCTTGCTGGC GGCCCGCCAC AGGCCCCCTT CAATGGCCGC ATTCAGGATG 
2351 GCTCTATACA CAGCAGTGCT GGTTTATGTA GAGTTCAGCA GTCACTTCAG 
2401 AGATGTATCT TGTCTTTGTC AGGCCCTTCA TCTTCATGGC CCACCTGTTT 
2451 TCTGCCGTGA CCTTTGGTCC CATTGAGGAC TAAGGATCGG GACCCTTTCT 
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2501 TTACCCCCTA CCCATTGTGG CTCCCACCCT GCCTCGGACT GGTTTACGTG 
2551 TCCTGGTTCA CACCCAGGAC TTTTCTTTGC AAGCGAACCT GTTTGAAGCC 
2601 CAAGTCTTAA CTCCTGGTCT CGTAAGGTTC CACTGAGACG AGATGTCTGA 
2651 GAACAACCAA AGAAGGCCTG CTCTTTGCTG CTTTTAAAAA ATGACAATTA 
2701 AATGTGCAGA TTCCCCACGC ACCCGATGAC CTATTTTTTC AGCCGTGGGA 
2751 GGAATGGAGT CTTTGGTACA TTCCTCACCG AGGTTAGCAG CTCAGTTTGT 
2801 GGTTATGAAA CCGTCTGTGG CCTCATGACA GCGAGAGATG GGAATACACT 
2851 AGAAGGATCT CTTTTCCTGT TTTCGTGAAA CGACTCTTGC CAAACGTTCC 
2901 CGAGGCGCCA AGGAGTGTAG TACACCCTGG CTGCCATCAC TCTATAAAAG 
2951 TGCTTCATGA GCCCAGACCA AAAGCCCACA GTGAAATGAA GTACCCTTTT 
3001 GTAAATAGCA TTTTTTTGCA GAAGGTGAAA ATTCCACTCT CTACCACCGG 
3051 GCCAGCCAAT AGATCACTTT GGTGAATGCT AGTTTCAAAT TTGATTCAAA 
3101 ATATTTCTTA GGTGAAAGAA CTAGCAGAAA GTCAAAAACT AAGATACTGT 
3151 AGACTGGACA AGAAATTCTA CCTGGGCACC TAGGTGATGC CTTCTTTCTT 
3201 TGATTGCCTT TCTAATAAAT GCAGAATCTG AAGGTAAATA GGTTTAAAAC 
3251 AAAACAAAAA CCCACCCCTT TAAGGAGTTG GTAAAAAGCA GTTCAACTCT 
3301 TAGCTTGACT GAGCTAAAAT TCACAGGACT ACGTGCTTTG TGCATTGTAG 
3351 TCTAGTCGTA ATTCATAGGT ACTGACTCCT CAGCCCCAAA TGTCGGAGAG 
3401 GAAGAATTCG GTCAGCCTGT CAGGTCGTGA GTCCAGTTAC CACCAAACAT 
3451 CTGGGAAACT TCTGGGTGCT GGGTGCTCTG CTGCTGGACT TTTGTGGCTG 
3501 TGTCTGTGTC TGCAAGATAA ATTAGATCGC CCTGTGGGGT TTGCAGAATT 
3551 AGTGAAGGGT CCAGGACGAT CCCAGTGGGC TCGCTTCCAA AGCATCCCAC 
3601 TCAAGGGAGA CTTGAAACTT CCAGTGTGAG TTGACCCCAT CATTTAAAAA 
3651 TAAAGTCCCC GGGTTCCTTA ATGCCTCCTT CACTGGGCCT TCCTAGCAGG 
3701 ATAGAAAGTC CTTGCCCAGA GCAGGACCTG GCTGTCTTTT TTTTTTTTTT 
3751 TTTCCCGAGA CCAAGTTTCA CTCTGTTGCC CAAGGTAGAG TGCAGTGGCG 
3801 TGATCTCTGC TCATTGCAAC TGCCGCCTCC CGGGTTCAAG CAATTCTCAT 
3851 GCATCAGCCT CCCAAGTACC TGGGACTACA GGCGTGAGCT ACCATGCCCG 
3901 GCTAATTTTT GTATTTTTAG TAGAGATGGG GTTTCATTAT GTTGGCCAGG 
3951 CTGGTCTCGA ACTCCTTACC TCAGGTGATC CACCCACCTT GGCCTCCCGA 
4001 AGTGCTGGGA TTACAGGCAT GAGCCACTGC GCCCGGCCAT GGACCTGGCT 
4051 GTCTTTATCA TCCCCACAAA CATTTTGAAA CTGGAATATT TGTCTTCAGA 
4101 AAATGGAAAC AAGACTATAA ATGATAAGCC CTGTCCCTAG CACCACCTCT 
4151 CCTGTGTGTG GAATAGAGGC CCCTCGTGCT ACCAACACTT ACCCTGTGTT 
4201 TAAAAAGATC TTGTACCAAG CCAACGGCGT TCCTGGCTCT CCTGCCCACA 
4251 GG AT GAAC AT TTTCGGCTTC CTTAGGAGTT TTGCCCTACC GTATTCCAAA 
4301 GCGTGTGCTG GTTTCTCATA TTGTCTGTAG GCTCACTCAG CCCGCAGTTT 
4351 ATGTGTGTGC TTTTTTCTAT GAAAAATGAT GTATTTTGCT ACTTCCTGTG 
4401 TACAAAGTTT TATTGTAAAT GTTTTTTGTG CTTTGCATGA ACAGGGGCCA 
4 451 CGTTGTTGCA ATTGTTTCAG TAGAACTGGT TTGATTTCTA AAATGTTCCT 
4 501 GTAACATATC TTTTATGAAC AAATCTGAAC AATTTGTGAA ATAAAACATT 
4551 GAAAACCAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS834352 from database EMBL: 
human STS WI-15502. 
Score = 1331, P = 5.46-54, identities = 287/301 



Medline entries 



98146272: 

cDNA cloning of a novel trypsin inhibitor with similarity to 

pathogenesis-related proteins, and its 

frequent expression in human brain cancer cells. 



Peptide information for frame 1 



ORF from 205 bp to 1695 bp; peptide length: 497 
Category: strong similarity to known protein 



1 MSCVLGGVIP LGLLFLVCGS QGYLLPNVTL LEELLSKYQH NESHSRVRRA 
51 IPREDKEEIL MLHNKLRGQV QPQASNMSYM TWDDELEKSA AAWASQCIWE 
101 HGPTSLLVSI GQNLGAHWGR YRSPGFHVQS WYDEVKDYTY PYPSECNPWC 
151 PERCSGPMCT HYTQIVWATT NKIGCAVNTC RKMTVWGEVW ENAVYFVCNY 
201 SPKGNWIGEA PYKNGRPCSE CPPSYGGSCR NNLCYREETY TPKPETDEMN 
251 EVETAPIPEE NHVWLQPRVM RPTKPKKTSA VNYMTQWRC DTKMKDRCKG 
301 STCNRYQCPA GCLNHKAKIF GTLFYESSSS ICRAAIHYGI LDDKGGLVDI 
351 TRNGKVPFFV KSERHGVQSL SKYKPSSSFM VSKVKVQDLD CYTTVAQLCP 
401 FEKPATHCPR IHCPAHCKDE PSYWAPVFGT NIYADTSSIC KTAVHAGVIS 
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451 NESGGDVDVM PVDKKKTYVG SLRNGVQSES LGTPRDGKAF RIFAVRQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4b4 , frame 1 

TREMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung 
protein 1"; Rattus~norvegicus late gestation lung protein 1 (Lgll) 
mRNA, complete cds., N = 1, Score =968, P - 1.9e-97 

TR£MBL:D45027_1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA 
for 25 kDa trypsin inhibitor, complete cds., N » 1, Score = 738, P = 
4.5e-73 

TREMBL:AB009609_1 gene: "HrTT-l"; Halocynthia roretzi HrTT-1 mRNA, 
complete cds., N - 1, Score = 345, P - 2e-31 

PIR:JC5308 testis-specific, vespid, and pathogenesis-related protein 1 
precursor - human, N - 1, Score - 337, P - 1.7e-30 



>TRSMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung protein 

1"; Rattus norvegicus late gestation lung protein 1 (Lgll) mRNA, complete 
cds. 

Length = 188 

HSPs: 

Score - 968 (145.2 bits), Expect « 1.9e-97, P - 1.9e-97 
Identities » 160/185 (86%), Positives « 170/185 (91%) 



Query: 61 MLHNKLRGQVQPQASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 120 

MLHNKLRGQV P ASNMEYMTWD+ELE+SAAAWA +C+WEHGP SLLVSIGQNL HWGR 
Sbjct: 1 MLHNKLRGQVYPPASNMEYMTWDEELERSAAAWAQRCLWEHGPASLLVSIGQNLAVHWGR 60 

Query: 121 YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 180 

YRSPGFHVQSWYDEVKDYTYPYP ECNPWCPERCSG MCTHYTQ+VWATTNKIGCAV+TC 
Sbjct: 61 YRSPGFHVQSWYDEVKDYTYPYPHECNPWCPERCSGAMCTHYTQMVWATTNKIGCAVKTC 120 

Query: 181 RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 240 

R M+VWG++WENAVY VCNYSPKGNWIGEAPYK+GRPCSECP SYGG CRNNLCYREE Y 
Sbjct: 121 RSMSVWGDIWENAVYLVCNYSPKGNWIGEAPYKHGRPCSECPSSYGGGCRNNLCYREEHY 180 

Query: 241 TPKPE 245 
KPE 

Sbjct: 181 HQKPE 185 

Pedant information for DKFZphtes3_4b4, frame 1 



Report for DKFZphtes3_4b4 . 1 

(LENGTH] 497 

(MWJ 55920.00 

[pi] 8.36 

[HOMOL] TREMBL:D45027_1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA for 25 
kDa trypsin inhibitor, complete cds. 6e-78 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YJL078c] 8e-12 

(BLOCKS] BL01009E Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009D Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009C Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

(BLOCKS] BL01009A Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[PIRKW] glycoprotein 5e-22 

[PIRKW] blocked amino end 5e-13 

[PIRKW] brain 9e-30 

[PIRKW] hydrolase 4e-09 

[PIRKW] hemolymph coagulation 4e-09 

[PIRKW] zymogen 4e-09 

[PIRKW] alternative splicing 4e-09 

(PIRKW] sperm 5e-22 

{ PIRKW] viroid-induced protein 2e-ll 

[PIRKW] venom 6e-18 

[PIRKW] pyroglutamic acid 2e-ll 

[PIRKW] transmembrane protein 2e-10 

(PIRKW] serine proteinase 4e-09 

(SUPFAM] C-type lectin homology 4e-09 

[SUPFAM] trypsin homology 4e-09 
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[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[PROSITE) 

[PROSITE] 

(PROSITE] 

CPROSITE] 

(PROSITE) 

[PROSITE) 

(PROSITE1 

[PFAM] 

[KW] 

[KW] 

[KW] 



complement factor H repeat homology 4e-09 
cyateine-rich secretory protein 1 6e-24 
pathogenesis-related leaf protein 7e-15 
MYRISTYL 8 
CAMP_PHOSPHO_SITE 3 
CK2_PHOSPHO SITE 6 
TYR PHOSPHORITE 1 
PKC~PHOSPHO SITE 8 
ASN~GLYCOSYLATION 3 
SCP_AG5_PR1_SC7_2 1 
SCP-like extracellular Proteins 
All Beta 

SIGNAL__PEPTIDE 23 

LOW COMPLEXITY 1.21 % 



SEQ MSCVLGGVIPLGLLFLVCGSQGYLLPNVTLLEELLSKYQHNESHSRVRRAIPREDKEEIL 

SEG xxxxxx 

PRO ccceeeeeceeeeeeeecccccccccchhhhhhhhhhhhhcccchhhhhhhccchhhhhh 

SEQ MLHN KL RGQVQ PQA SNMEYMTWDD ELEKS AAAW ASQC I WEHG PTS LLV S I GQNLG AHWGR 

SEG 

PRD hhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeecc 

SEQ YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 

SEG 

PRD ccccchhhhhhhhhhhccccccccccccccccccccccccceeeeeeeccccccceeeec 

SEQ RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 

SEG 

PRD cccccccccccceeeeeeeccccccccccccccccccccccccccccccccccccccccc 

SEQ TPKPETDEMNEVETAPIPEENHVWLQPRVMRPTKPKKTSAVNYMTQVVRCDTKMKDRCKG 

SEG 

PRD cccccccccccccccccccceeeeecccccccccccceeeeeeeeeeeeecccccccccc 

SEQ STCNRYQCPAGCLNHKAKIFGTLFYESSSSICRAAIHYGILDDKGGLVDITRNGKVPFFV 

SEG 

PRD ccccccccccccccccceeeeeeeeecccceeeeeccccccccccceeeeeccccceeee 

SEQ KSERHGVQSLSKYKPSSSFMVSKVKVQDLDCYTTVAQLCPFEKPATHCPRIHCPAHCKDE 

SEG 

PRD eccceeeeeeeeccccceeeeeeeeeecccceeeeeeeeccccccccccccccccccccc 

SEQ PSYWAPVFGTNIYADTSSICKTAVHAGVISNESGGDVDVMPVDKKKTYVGSLRNGVQSES 

SEG 

PRD ccceeeeeceeeccccceeeeeeeeccccccccccccceeecccceeeeeecccceeeee 

SEQ LGTPRDGKAFRI FAVRQ 

SEG 

PRD ccccccccceeeeeccc 



Prosite for DKFZphtes3_4b4 . 1 



PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 



27->31 
41->45 
451->455 
181->185 
276->280 
464->46B 
170->173 
179->182 
201->204 
228->231 
241->244 
362->365 
471->474 
483->486 
29->33 
75->79 
81->85 
130->134 
453->457 
483->487 
385->393 
111->117 
115->121 
174->180 
204->210 



ASN_GLYCOSYLATION 

ASN_GLYCOS YLAT ION 

ASN_GLYCOSYLATION 

CAMP PHOSPHO SITE 

CAMP~PHOSPHO~SITE 

C AMP_PHOS PHO~S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC~PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

TYR PHOSPHORS ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
P0OC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS01010 



227->233 
300->306 
447->453 
470->476 
195->207 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 

SCP AGS PRl SC7 2 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00772 



Pfam for DKFZphtes3_4b4 . 1 



HMMJJAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



SCP-like extracellular Proteins 



52 



"PQDEQDEWLNkHNDFRQQVGRGLETRGNPGPQPPAsNMnPMVWNDELAt 
P + ++E+L HN +R QV P ASHM M+W+DEL + 

PREDKEEILMLHNKLRGQVQ PQASNMEYMTWDDELEK 



IAQnWANQCiFDHHDCCWNHsnYPYGQNIAWWSsTANnPWnWssMIQMWY 
A WA+QCI +H ++ + S GQN+ + + ++++ +Q+WY 
89 SAAAWASQCIWEHGPTSLLVSI GQNLGAHWG RYRSPGFHVQSWY 

NEvkDYNYNWNTCkGG NNFmVCGHYTQMVWRnTf rIGCGRYICYC 

+ EVKDY Y + + +C HYTQ+VW+ T +IGO C+ 

133 DEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTCRK 

NNNWrKPDPWKhkWYYVCNYCPpGNYmN* 
+ W + W+ +Y VCNY P+GN+++ 
183 MTVW — GEVWENAVYFVCNYSPKGNWIG 208 



88 



132 



182 
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DKFZphtes3_4fl7 



group: testes derived 

DKF2phtes3_4f 17 encodes a novel 656 amino acid protein with weak similarity to methyl-CpG- 
binding proteins. 

Methylation at the DNA sequence 5'-CpG is required for mammalian development. Methyl-CpG- 
binding proteins bind specifically to methylated DNA via a related amino acid motif and can 
repress transcription. The novel protein does not contain such a motif e. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



similarity to methyl-CpG-binding protein 

extension of HS557771/HSZ78337, 

there are some differences to these sequences 

Sequenced by AGOWA 

Locus: /map= ,, 18 w 

Insert length: 2320 bp 

Poly A stretch at pos. 2266, polyadenylation signal at pos . 2251 



1 GGCAGGTTCG CGGGTCGCTG GCGGGGGTCG TGAGGGAGTG CGCCGGGAGC 
51 GGAGATATGG AGGGAGATGG TTCAGACCCA GAGCCTCCAG ATGCCGGGGA 
101 GGACAGCAAG TCCGAGAATG GGGAGAATGC GCCCATCTAC TGCATCTGCC 
151 GCAAACCGGA CATCAACTGC TTCATGATCG GGTGTGACAA CTGCAATGAG 
201 TGGTTCCATG GGGACTGCAT CCGGATCACT GAGAAGATGG CCAAGGCCAT 
251 CCGGGAGTGG TACTGTCGGG AGTGCAGAGA GAAAGACCCC AAGCTAGAGA 
301 TTCGCTATCG GCACAAGAAG TCACGGGAGC GGGATGGCAA TGAGCGGGAC 
351 AGCAGTGAGC CCCGGGATGA GGGTGGAGGG CGCAAGAGGC CTGTCCCTGA 
401 TCCAGACCTG CAGCGCCGGG CAGGGTCAGG GACAGGGGTT GGGGCCATGC 
4 51 TTGCTCGGGG CTCTGCTTCG CCCCACAAAT CCTCTCCGCA GCCCTTGGTG 
501 GCCACACCCA GCCAGCATCA CCAGCAGCAG CAGCAGCAGA TCAAACGGTC 
551 AGCCCGCATG TGTGGTGAGT GTGAGGCATG TCGGCGCACT GAGGACTGTG 
601 GTCACTGTGA TTTCTGTCGG GACATGAAGA AGTTCGGGGG CCCCAACAAG 
651 ATCCGGCAGA AGTGCCGGCT GCGCCAGTGC CAGCTGCGGG CCCGGGAATC 
701 GTACAAGTAC TTCCCTTCCT CGCTCTCACC AGTGACGCCC TCAGAGTCCC 
751 TGCCAAGGCC CCGCCGGCCA CTGCCCACCC AACAGCAGCC ACAGCCATCA 
801 CAGAAGTTAG GGCGCATCCG TGAAGATGAG GGGGCAGTGG CGTCATCAAC 
851 AGTCAAGGAG CCTCCTGAGG CTACAGCCAC ACCTGAGCCA CTCTCAGATG 
901 AGGACCTACC TCTGGATCCT GACCTGTATC AGGACTTCTG TGCAGGGGCC 
951 TTTGATGACC ATGGCCTGCC CTGGATGAGC GACACAGAAG AGTCCCCATT 
1001 CCTGGACCCC GCGCTGCGGA AGAGGGCAGT GAAAGTGAAG CATGTGAAGC 
1051 GTCGGGAGAA GAAGTCTGAG AAGAAGAAGG AGGAGCGATA CAAGCGGCAT 
1101 CGGCAGAAGC AGAAGCACAA GGATAAATGG AAACACCCAG AGAGGGCTGA 
1151 TGCCAAGGAC CCTGCGTCAC TGCCCCAGTG CCTGGGGCCC GGCTGTGTGC 
1201 GCCCCGCCCA GCCCAGCTCC AAGTATTGCT CAGATGACTG TGGCATGAAG 
1251 CTGGCAGCCA ACCGCATCTA CGAGATCCTC CCCCAGCGCA TCCAGCAGTG 
1301 GCAGCAGAGC CCTTGCATTG CTGAAGAGCA CGGCAAGAAG CTGCTCGAAC 
1351 GCATTCGCCG AGAGCAGCAG AGTGCCCGCA CCCGCCTTCA GGAAATGGAA 
1401 CGCCGATTCC ATGAGCTTGA GGCCATCATT CTACGTGCCA AGCAGCAGGC 
1451 TGTGCGCGAG GATGAGGAGA GCAACGAGGG TGACAGTGAT G AC AC AG AC C 
1501 TGCAGATCTT CTGTGTTTCC TGTGGGCACC CCATCAACCC ACGTGTTGCC 
1551 TTGCGCCACA TGGAGCGCTG CTACGCCAAG TATGAGAGCC AGACGTCCTT 
1601 TGGGTCCATG TACCCCACAC GCATTGAAGG GGCCACACGA CTCTTCTGTG 
1651 ATGTGTATAA TCCTCAGAGC AAAACATACT GTAAGCGGCT CCAGGTGCTG 
1701 TGCCCCGAGC ACTCACGGGA CCCCAAAGTG CCAGCTGACG AGGTATGCGG 
17 51 GTGCCCCCTT GTACGTGATG TCTTTGAGCT CACGGGTGAC TTCTGCCGCC 
1801 TGCCCAAGCG CCAGTGCAAT CGCCATTACT GCTGGGAGAA GCTGCGGCGT 
1851 GCGGAAGTGG ACTTGGAGCG CGTGCGTGTG TGGTACAAGC TGGACGAGCT 
1901 GTTTGAGCAG GAGCGCAATG TGCGCACAGC CATGACAAAC CGCGCGGGAT 
1951 TGCTGGCCCT GATGCTGCAC CAGACGATCC AGCACGATCC CCTCACTACC 
2001 GACCTGCGCT CCAGTGCCGA CCGCTGAGCC TCCTGGCCCG GACCCCTTAC 
2051 ACCCTGCATT CCAGATGGGG GAGCCGCCCG GTGCCCGTGT GTCCGTTCCT 
2101 CCACTCATCT GTTTCTCCGG TTCTCCCTGT GCCCATCCAC CGGTTGACCG 
2151 CCCATCTGCC TTTATCAGAG GGACTGTCCC CGTCGACATG TTCAGTGCCT 
2201 GGTGGGGCTG CGGAGTCCAC TCATCCTTGC CTCCTCTCCC TGGGTTTTGT 
2251 TAATAAAATT TTGAAGAAAC CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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Entry HS557771 from database EMBLEST: 

Human chromosome 18 clone 2 mRNA sequence. 

Score - 7582, P - 0.0e+00, identities =» 1560/1598 

Entry HSZ78337 from database EMBLEST: 

H. sapiens mRNA, expressed sequence tag ICRFp507H02194 (5') 
Score = 6339, P - 9.0e-281, identities = 1307/1347 

Entry HS095149 from database EMBL: 
human STS WI-6941. 
Score o 1210, P - 2.2e-49, identities » 246/251 



Medline entries 



98449942: 

Identification and characterization of a family of mammalian methyl-CpG 
binding proteins. 

9824997: 

Gene silencing by methyl-CpG-binding proteins. 



Peptide information for frame 3 



ORF from 57 bp to 2024 bp; peptide length: 656 
Category: similarity to known protein 



1 MEGDGSOPEP PDAGEDSKSE NGENAPI YCI CRKPDINCFM IGCDNCNEWF 
51 HGDCIRITEK MAKAIREWYC RECREKDPKL EIRYRHKKSR ERDGNERDSS 
101 EPRDEGGGRK RPVPDPDLQR RAGSGTGVGA MLARGSASPH KSSPQPLVAT 
151 PSQHHQQQQQ QIKRSARMCG ECEACRRTED CGHCDFCRDM KKFGGPNKIR 
201 QKCRLRQCQL RARESYKYFP SSLSPVTPSE SLPRPRRPLP TQQQPQPSQK 
251 LGRIREDEGA VASSTVKEPP EATATPEPLS DEDLPLDPDL YQDFCAGAFD 
301 DHGLPWMSDT EESPFLDPAL RKRAVKVKHV KRREKKSEKK KEERYKRHRQ 
351 KQKHKDKWKH PERADAKDPA SLPQCLGPGC VRPAQPSSKY CSDDCGMKLA 
4 01 ANRIYEILPQ RIQQWQQSPC IAEEHGKKLL ERIRREQQSA RTRLQEMERR 
451 FHELEAIILR AKQQAVREDE ESNEGDSDDT DLQIFCVSCG HPINPRVALR 
501 HMERCYAKYE SQTSFGSMYP TRIEGATRLF CDVYNPQSKT YCKRLQVLCP 
551 EHSRDPKVPA DEVCGCPLVR DVFELTGDFC RLPKRQCNRH YCWEKLRRAE 
601 VDLERVRVWY KLDELFEQER NVRTAMTNRA GLLALMLHQT IQHDPLTTDL 
651 RSSADR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_4f 17 , frame 3 

TREMBL : CEF52B1 1_4 gene: H F52Bll.l M ; Caenorhabditis elegans cosmid 
F52B11, N = 2, Score » 316, P - 8.8e-27 

TREMBL: HSAB2331_1 gene: ,, KIAA0333 ,, ; Human mRNA for KIAA0333 gene, 
partial cds . , N = 2, Score => 163, P » 2.8e-13 

TREMBL :SPCC594_5 gene: "SPCC594 . 05c w ; product: "putative 
transcriptional regulatory protein, phd finger containing"; S.pombe 
chromosome III cosmid c594 . , N = 3, Score « 168, P =» 3.6e-12 

TREMBL : AF0 7224 0_1 gene: w Mbdl"; product: "methyl-CpG binding protein 
MBD1" ; Mus musculus methyl-CpG binding protein MBD1 (Mbdl) mRNA, 
complete cds., N - 2, Score = 189, P = 7.6e-ll 



>TREM3L:CEF52B11_4 gene: "F52B11.1 W ; Caenorhabditis elegans cosmid F52B11 
Length - 523 

HSPs: 

Score = 316 (47.4 bits), Expect *» 8.8e-27, Sum P(2) « 8.8e-27 
Identities - 100/336 (29%), Positives = 167/336 (49%) 
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Query: 


333 


Sbjct: 


118 


Query: 


391 


Sbjct: 


177 


Query: 


446 


Sbjct: 


237 


Query: 


504 


Sbjct: 


291 


Query: 


564 


Sbjct: 


347 


Query: 


608 


Sbjct: 


407 


Score 


- 53 



+R +Q+ 



+ +A +P P QCL P C+ ++ SKY 



CSD+CG +LA R+ EILP R +Q+ 



++L EI + K Q + +E 



E+ 



+ 1 RE Q 



D +L C+ CG P P + +H+E 
-DDNLYEGCIVCGLPDIPLLKYTKHIE 290 



C+A+ E SFG+ P + 



CG P 



++ K+ EL 



V ++ E+ 



+C+ Y+ ++ ++CKRL+ LCPEH + 



CR K C++H+ W 



+V 



++LE+ 



T A L++M+H+ 



LR+ A 



Identities = 24/100 (24%), Positives - 41/100 (41%) 

Query: 169 CGECEACRRTEDCGHCDFCR DMKK-FGGPNKIRQKCRLRQCQLRARESYKYFPSS 222 

C C C ++CG C CR DM+K F +K + RQ + + + 

Sbjct: 17 CMNCIRCNDEKNCGTCWPCRNGKTCDMRKCFSAKRLYNEKVK-RQTDENLK-AIMAKTAQ 74 

Query: 223 LSPVTPSESLPRPRRPLPTQQQPQPSQKLGRIR-EDEGAVASS 264 

+ + p P+ +QQ + +K GR + G A++ 
Sbjct: 75 REAAHQAATTTAPSAPVVIEQQVE-KKKRGRKKGSGNGGAAAA 116 

Score = 48 {7.2 bits), Expect » 2.9e-26, Sum P{2) 2.9e-26 
Identities « 13/39 (33%), Positives = 19/39 (48%) 

Query: 179 EDCGHCDFCRDMKKFGG — PNKTRQKCRLRQCQLRARESY 216 

EC+CCDKG P++C +R+C A+ Y 
Sbjct: 15 ERCMNCIRCNDEKNCGTCWPCRNGKTCDMRKC-FSAKRLY 53 



Pedant information for DKFZphtes3_4f 17, frame 3 



Report for DKFZphtes3_4f 17 . 3 



656 

75711.71 
8.61 

TREMBL :CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid F52B11 3e-25 

99 unclassified proteins [S. cerevisiae, YPLl38c] 3e-10 

04.05.01.04 transcriptional control [S. cerevisiae, YNL097c) 2e-04 

MYRISTYL 6 

AMIDATION 2 

CK2_PHOSPHO_SITE 8 

TYR_PHOSPHO SITE 3 

GLYCOSAMINOGLYCAN 1 

PKC PHOSPHO SITE 9 

All~Alpha 

LOW~COMPLEXITY 18.75 % 
COILED COIL 4.57 % 



SEQ MEGDGSDPEPPDAGEDSKSENGENAPIYCICRKPDINCFMIGCDNCNEWFHGDCIRITEK 

SEG 

PRO cccccccccccccccccccccccccceeeeeeccccceeeeecccccccccccchhhhhh 

COILS 

SEQ MAKAI REWYCRECREKDPKLEI RYRHKKSRERDGNERDSSEPRDEGGGRKRPVPDPDLQR 

SEG 

PRO hhhhhhhhhhhccccccccchhhhhhhhhccccccccccccccccccccccccccccccc 

COILS 

SEQ RAGSGTGVGAMLARGSASPHKSSPQPLVATPSQHHQQQQQQIKRSARMCGECEACRRTED 

SEG xxxxxxxxx 

PRD cccccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccc 

COILS 



[LENGTH] 
[MW] 
Epll 
CHOMOL] 

[FUNCAT1 

[FUNCATJ 

[PROSITE] 

[PROSITEJ 

[PROSITE] 

(PROSITE] 

(PROSITE] 

I PROSITE] 

(KWJ 

IKW] 

[KW] 



878 



WO 01/12659 



PCT/IB00/01496 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



SEQ 
SEG 
PRD 
COILS 



CGHCDFCRDMKKFGGPNKIRQKCRLRQCQLRARESYKYFPSSLSPVTPSESLPRPRRPLP 

xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

cccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccccccccccccccc 



TQQQPQPSQKLGRIREDEGAVASSTVKEPPEATATPEPLSDEDLPLDPDLYQDFCAGAFD 

xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



DHGLPWMSDTEESPFLDPALRKRAVKVKHVKRREKKSEKKKEERYKRHRQKQKHKDKWKH 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

cccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchh 

PERADAKDPASLPQCLGPGCVRPAQPSSKYCSDDCGMKLAANRIYEILPQRIQQWQQSPC 



SEQ 
SEG 

PRD hhhhhccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccch 
COILS 



IAEEHGKKLLERIRREQQSARTRLQEMERRFHELEAI ILRAKQQAVREDEESNEGDSDDT 

xxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccc 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

DLQI FCVSCGHPINPRVALRHMERCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKT 

x 

ceeeeeeeccccccccchhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccc 

YCKRLQVLCPEHSRDPKVPADEVCGCPLVRDVFELTGDFCRLPKRQCNRHYCWEKLRRAE 



SEQ 
SEG 

PRD cchhhhhhhccccccccccceeeeccccchhhhhccccccccccccccchhhhhhhhhhh 
COILS 



VDLERVRVWYKLDELFEQERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSADR 
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 



Prosite for DKFZphtes3_4f 17 . 3 



PS00002 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00O06 

PS00006 

PS00007 

PS00007' 

PSO0OO7 

PS00008 

PS00O08 

PS00008 

PS00008 

PS00008 

PS00008 

PS00009 

PS00009 



124- >128 
58->61 

165->168 
215->218 
248->251 
26S->268 
337->340 
387->390 
439->442 
627->630 
6->10 

17->21 
227->231 
265->269 
280->284 
308->312 
521->525 
6S2->656 
339->346 
500->507 
211->219 

42->48 
123->129 

125- >131 
129->135 
259->265 
396->402 
107->111 
425->429 



GLYCOSAMINOGLYCAN 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC^PHOSPHO^SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH02SITE 

PKC_PHOSPHO__SITE 

PKC PHOSPHO SITE 

CK2~PHOSPHO~SITE 

CK2~PHOSPHO~SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PKOSPHO~SITE 

CK2_PHOSPHO~SITE 

TYR_PHOSPHO~SITE 

tyr_?hospho~site 

tyr_phospho~site 

myristyl 

myristyl 

myristyl 

myristyl 

myristyl 

myristyl 

amidation 

ami dat i on 



PDOC00002 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0O005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_4f 17 . 3) 
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DKFZphtes3_4f5 



group: signal transduction 

DKFZphtes3 4f5.3encodes a novel "790 amino acid protein similar to beta-transducins . 

The protein contains 3 WD-40 repeats, which are typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a Cytochrome C family heme- 
binding site signature is present. The protein is larger (790 amino acids) than the usual 
eukaryotic G-beta transducins (about 340 amino acids). 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 



similarity to S.pombe "beta-transducin" 

complete cDNA, EST hits 
complete cds, 

on genomic level encoded by HS313D11, at least 7 exons these exons 
match 

only partialy with the predicted transcripts in HS313D11 
Sequenced by AGOWA 
Locus: /map-"16pl3 .3" 
Insert length: 3166 bp 

No poly A stretch found, no polyadenylation signal found 



1 GGCGGCTTCC GGCGCGGCGG TTCCGGACAA CCGTGCGCTT TTAGTAAAAG 
51 ATTGGGGTTC GCGCGGGGGA GAAGGGCTGC CCCGGGCCCT CTGGTTCTCG 
101 TCCCGCAGCG TCCGCTCCCC CGCGCCACTG CGCCGCTCCC AGGAACCCTG 
151 TACTCCGGGG TCGCCGGCTT CTCTCCTGCC TCCGGTCCCG CCAGACACCT 
201 CGAGCTCCTT AAGTAGCTCG GTCCTTGACG TCCCTCTGGG CCCTTCCCGC 
251 GTCTATCGCC TGAGTCCCCG GGCCCCTCTA GCCCTCTGTT CCCTCCCCTC 
301 TTTTGTTCCT CCCTAGAGCC CCGCCGCCCT CAGGGCTGAC AGTGTGGACG 
351 GCGGGAGTCT CCTCGCTCCC CTGCTGGGAT TGACTGACCG AGCGTTTAGT 
401 GACTGCCCAG ATCTGGCTGA TGGGGGTACC GAGAGGTGGC CTGGGCCGGG 
4 51 AATGTCCAGC TAGAGTCTTC CGTGGAAGTC AGACATGAAA CTGACAGGCC 
501 TAAGGGAAGC TAGGAAGTCC CCTCACCGCT CAGCCAGGGT GATGGGCTGG 
551 ACTGACAGAC TCCAGTGAAT TTGAGCTTGC CTGTCAGGCT GATTGGCTGA 
601 TAGACAGCCC TGGATTGGCT CACTAAGACT GACCAGCCCG GGACCAAGCA 
651 GTTCTGGGGT CCCAACCTGG GTGGAAGGTC TGAACTGATG ACCCACCCAG 
■701 GCTGACCAGG CCAGCCCACC TCACTGACCT CCTGACCCCT GACCTCATCA 
"7 51 CCTGTGCAGC CATGGAGAAG ATGTCCCGTG TGACCACAGC CCTGGGTGGC 
801 AGCGTGCTGA CAGGCCGCAC CATGCACTGC CACCTGGATG CTCCCGCCAA 
851 TGCCATCAGT GTGTGCCGCG ACGCAGCCCA GGTGGTCGTG GCAGGCCGTA 
901 GCATCTTCAA GATCTATGCC ATCGAGGAGG AACAGTTCGT GGAAAAGCTG 
951 AACCTGCGTG TGGGGCGCAA GCCTTCGCTT AACCTGAGCT GTGCTGACGT 
1001 GGTCTGGCAC CAGATGGATG AGAACCTGCT GGCCACAGCA GCCACCAATG 
1051 GCGTGGTGGT CACGTGGAAC CTGGGCCGGC CATCCCGCAA CAAGCAGGAC 
1101 CAGCTGTTCA CAGAACACAA GCGCACGGTA AACAAAGTCT GCTTCCACCC 
1151 CACCGAAGCC CACGTGCTGC TCAGTGGCTC CCAGGATGGC TTCATGAAGT 
1201 GCTTTGACCT CCGCAGAAAG GACTCTGTCA GCACCTTCTC GGGCCAGTCG 
12 51 GAGAGCGTGC GGGACGTGCA GTTCAGTATC CGGGACTACT TCACCTTCGC 
1301 CTCCACCTTT GAGAACGGCA ATGTGCAGCT CTGGGACATC CGGCGTCCCG 
1351 ACCGGTGCGA GAGGATGTTC ACAGCCCACA ACGGACCCGT CTTCTGCTGC 
1401 GACTGGCACC CCGAGGACAG GGGCTGGTTG GCCACTGGAG GGCGCGACAA 
1451 GATGGTGAAG GTCTGGGACA TGACCACGCA CCGTGCCAAG GAGATGCACT 
1501 GTGTGCAGAC CATCGCCTCG GTGGCCCGTG TGAAGTGGCG GCCAGAGTGC 
1551 CGCCACCACC TGGCCACGTG CTCCATGATG GTGGACCACA ACATCTATGT 
1601 TTGGGACGTG CGCCGGCCCT TCGTGCCAGC TGCCATGTTT GAGGAACACC 
1651 GAGACGTCAC CACGGGAATT GCCTGGCGCC ACCCCCACGA CCCCTCCTTC 
1701 CTGCTGTCTG GCTCCAAGGA CAGCTCGCTG TGCCAGCACC TGTTCCGCGA 
17 51 CGCCAGCCAG CCCGTCGAGC GCGCCAACCC TGAGGGCCTC TGCTACGGCC 
1801 TCTTCGGGGA CCTGGCCTTC GCCGCCAAGG AGAGCCTCGT GGCTGCCGAG 
1851 TCGGGGCGCA AGCCCTACAC TGGCGACCGG CGCCACCCCA TCTTCTTTAA 
1901 GCGCAAGCTG GACCCTGCCG AGCCCTTCGC AGGCCTCGCC TCCAGTGCCC 
1951 f CAGTGTCTT TGAGACGGAG CCAGGTGGCG GCGGCATGCG CTGGTTTGTG 
2001 GACACAGCTG AGCGTTATGC GCTGGCTGGC CGGCCACTGG CCGAGCTCTG 
2051 TGACCACAAC GCAAAGGTGG CTCGAGAGCT TGGCCGCAAC CAGGTGGCGC 
2101 AAACGTGGAC CATGCTGCGG ATCATCTACT GCAGCCCTGG CCTAGTGCCC 
2151 ACTGCAAACC TCAACCACAG TGTGGGCAAG GGTGGCTCCT GTGGCCTCCC 
2201 GCTCATGAAC AGTTTCAACC TGAAGGATAT GGCCCCAGGG TTGGGCAGTG 
2251 AGACGCGGCT GGACCGCAGC AAAGGAGATG CACGGAGCGA CACAGTTCTG 
2301 CTCGACTCCT CGGCCACACT CATCACCAAT GAGGATAACG AGGAAACCGA 
2351 GGGCAGCGAC GTACCTGCCG ACTACCTGCT GGGTGACGTG GAAGGTGAGG 
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2401 AGGACGAGCT GTACCTGCTG GATCCGGAAC ACGCGCACCC CGAGGACCCT 
24 51 GAGTGCGTGC TGCCGCAGGA GGCCTTTCCG CTGCGCCACG AGATCGTGGA 
2501 CACGCCTCCC GGACCCGAGC ACCTGCAGGA CAAGGCCGAC TCCCCGCACG 
2551 TGAGCGGCAG CGAGGCGGAT GTGGCCTCCC TGGCCCCCGT GGACTCCTCC 
2601 TTCTCGCTCC TGTCTGTCTC ACACGCGCTC TACGACAGCC GCCTGCCGCC 
2651 CGACTTCTTC GGCGTGCTGG TGCGCGACAT GCTGCACTTC TACGCTGAGC 
2701 AGGGCGACGT GCAGATGGCT GTGTCTGTGC TCATCGTCCT GGGTGAACGG 
27 51 GTGCGCAAGG ACATCGACGA GCAGACCCAG GAGCACTGGT ACACTTCCTA 
2801 CATCGACCTG CTGCAGCGCT TCCGCCTCTG GAACGTGTCC AACGAGGTGG 
2851 TCAAGCTGAG CACCAGCCGC GCCGTCAGCT GCCTCAACCA GGCCTCCACC 
2901 ACCCTGCACG TCAACTGCAG CCACTGCAAG CGGCCCATGA GCAGCCGGGG 
2951 CTGGGTCTGC GACAGGTGCC ACCGCTGCGC CAGCATGTGT GCCGTCTGCC 
3001 ACCACGTAGT CAAGGGTCTC TTCGTGTGGT GCCAGGGCTG CAGCCACGGC 
3051 GGCCACCTGC AGCACATCAT GAAGTGGCTG GAAGGCAGCT CCCACTGTCC 
3101 CGCAGGCTGC GGCCACCTCT GCGAGTACTC CTGACGGGGC ATCTGCTGGG 
3151 CTTGCCCGGG CGGCCG 



BLAST Results 



Entry HS313D11 from database EMBL: 

Human DNA sequence from cosmid 313D11 from a contig on the short arm of 
chromosome 16. Contains ESTs, STS and CpG islands. 
Score = 6238, P - O.Oe+00, identities = 1318/1391 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 762 bp to 3131 bp; peptide length: 790 
Category: similarity to known protein 



1 MEKMSRVTTA LGGSVLTGRT MHCHLDAPAN 

51 IYAIEEEQFV EKLNLRVGRK PSLNLSCADV 

101 TWNLGRPSRN KQDQLFTEHK RTVNKVCFHP 

151 RRKDSVSTFS GQSESVRDVQ FSIRDYFTFA 

201 RMFTAHNGPV FCCDWHPEDR GWLATGGRDK 

251 IASVARVKWR PECRHHLATC SMMVDHNIYV 

301 TGIAWRHPHD PSFLLSG5KD SSLCQHLFRD 

351 LAFAAKESLV AAESGRKPYT GDRRHPIFFK 

401 ETEPGGGGMR WFVDTAERYA LAGRPLAELC 

451 MLRIIYCSPG LVPTANLNHS VGKGGSCGLP 

501 DRSKGDARSD TVLLDSSATL ITNEDNEETE 

551 YLLDPEHAHP EDPECVLPQE AFPLRHEIVD 

601 EADVASLAPV DSSFSLLSVS HALYDSRLPP 

651 QMAVSVLIVL GERVRKDIDE QTQEHWYTSY 

701 TSRAVSCLNQ ASTTLHVNCS HCKRPMSSRG 

751 KGLFVWCQGC SHGGHLQHIM KWLEGSSHCP 

BLASTP hits 

Entry YDSB SCHPO from database SWISSPROT: 

HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN C4F8.11 IN 
CHROMOSOME I. >TREMBL : SPAC4F8_11 gene: "SPAC4F8 . 11" ; product: 
"beta-transducin"; S.pombe chromosome I cosmid c4F8. 
Score - 404, P - 3.0e-42, identities = 169/639, positives 278/639 

Entry PEX7 HUMAN from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7). 
>TREMBL:HSU7 6560_1 gene: "Pex7"; product: "peroxisome targeting signal 
2 receptor"; Human peroxisome targeting signal 2 receptor (Pex7) mRNA, 
complete cds. >TREMBL:HSU88871_1 gene: "HsPEX7"; product: "HsPex7p"; 
Human HsPex7p (HSPEX7) mRNA, complete cds. 

Score = 220, P » l.le-15, identities * 62/244, positives - 107/244 
Entry PEX7 MOUSE from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) {PEROXIN-7). 
>TREMBL:MMU69171_1 product: "peroxisomal PTS2 receptor"; Mus musculus 
peroxisomal PTS2 receptor mRNA, complete cds. 

Score = 214, p = 5.3e-15, identities = 60/240, positives - 106/240 
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Entry ATAC2294_7 from database TREMBL: 

gene: "F11P17.7"; Arabidopsis thaliana chromosome I BAC F11P17 genomic 
sequence, complete sequence. 

Score - 232, P * 3.4e-14, identities « 68/260, positives = 120/260 
Entry S66835 from database PIR: 

probable membrane protein YOL138c - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCYOL138C_l S. cerevisiae chromosome XV reading frame ORF 

Score - 136, P « 2.5e-13, identities » 24/77, positives = 44/77 



Alert BLASTP hits for DKFZphtes3_4f 5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_4f 5, frame 3 



Report for DKFZphtes3_4f 5 . 3 



[LENGTH] 7 90 

[MW] 88207.10 

[pi] 6.05 

(HOMOL) SWISSPROT:YDSB_SCHPO HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN 

C4F8.11 IN CHROMOSOME I. 9e-4 4 



( FUNCAT ] 
[ FUNCAT ) 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
( FUNCAT ] 
3e-10 
[FUNCAT] 

TAF90 - TFIID subunitj 9e-09 



99 unclassified proteins [S. cerevisiae, YOL138c] 5e-l6 

10.04.09 regulation of g-protein activity [S. cerevisiae, YBR195cJ 3e-ll 
06.10 assembly of protein complexes [S. cerevisiae, YBR195c] 3e-ll 
03.16 dna synthesis and replication [S. cerevisiae, YBR195c] 3e-ll 
09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 3e-ll 

04.05.01.07 chromatin modification [S. cerevisiae, YBRl95c] 3e-ll 
30.10 nuclear organization [S. cerevisiae, YCR072c beta-transducin family] 

[S. cerevisiae, YBRl98c 



[FUNCAT] 
[FUNCAT] 
YDL195w] 2e-07 
[FUNCAT] . 
2e-07 
[FUNCAT] 
[FUNCAT] 
4e-07 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
le-05 
(FUNCAT] 
palmitylation, 
[FUNCAT] 
[SCOPJ 
(PIRKWJ 
[PIRKW1 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW) 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
(SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE) 
[PROSITE] 



04.05-01.01 general transcription activities 
iubunit] 9e-09 

04.01.04 rrna processing (S. cerevisiae, YLLOllw] le-07 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 

08.07 vesicular transport [golgi network, etc.) [SI cerevisiae, YDL195w) 

30.19 peroxisomal organization [S. cerevisiae, YDR142c] 4e-07 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR142c] 

08.10 peroxisomal transport [S. cerevisiae, YDR142c] 4e-07 
08.01 nuclear transport (S. cerevisiae, YER107cJ 4e-07 
04.07 rna transport [S. cerevisiae, YERl07c] 4e-07 

30.03 organization of cytoplasm [S. cerevisiae, 
03.22 cell cycle control and mitosis [S. cerevisiae, 
06.13 proteolysis [S. cerevisiae, YGL003c] 5e-07 
04.05.01.04 transcriptional control [S. cerevisiae, 
04.05.03 mrna processing (splicing) [S. cerevisiae, 
03.13 meiosis [S. cerevisiae, YLRl29w] 3e-06 
03.25 cytokinesis (S. cerevisiae, YCR057c] le-05 

03.04 budding, cell polarity and filament formation 



YER107c] 
YGL003c] 

YCR084C] 
YPR178W] 



4e-07 
5e-07 

8e-07 
le-06 



[S. cerevisiae, YCR057c] 



06.07 protein modification (glycolsylation, acylation, myristylation, 

farnesylation and processing) [S. cerevisiae, YEL056w] 2e-04 

30.04 organization of cytoskeleton [S. cerevisiae, YOR272w] 6e-04 

dlgotb_ 2.46.3.1.1 betal-subunit of the signal-transducing 5e-06 

duplication 7e-10 

signal transduction 7e-08 

peroxisome 9e-06 

heterotrimer 7e-08 

GTP binding 7e-08 

peroxisome biogenesis 9e-06 

transmembrane protein le-14 

MSI1 protein 7e-10 

WD repeat homology le-14 

GTP-binding regulatory protein beta chain 7e-08 

PRL1 protein 3e-08 

coatomer complex beta* chain le-06 

CYTOCHROME^" 1 

WD_ RE PEATS 3 

MYRISTYL 10 

AMI DAT ION 2 

CAMP_PHOSPHO_SITE 2 

CK2 PHOSPHO SITE 11 



882 



WO 01/12659 



[PROSITEJ 

(PROSITE] 

[PROSITE] 

(PFAM) 

[KW] 

(KWJ 

(KW) 



TYR PHOSPHO_SITE 1 
PKC~PHOSPHO_SITE 7 

asn~glycosylation 4 



WD domain, G-beta repeats 

All_Beta 

3D 

LOW COMPLEXITY 2.28 % 



SEQ MEKMS RVTT ALGGS VLTGRTMHCHLDAPANAI S VCRDAAQV VVAGRS I FK I Y AI EEEQFV 

SEG 

IgotB 

SEQ EKLNLRVGRKPSLNLSCADVVWHQMDENLLATAATNGVWTWNLGRPSRNKQDQLFTEHK 

SEG 

IgotB TTCEEEEEETTTEEEEEET-TTTCEEE — EEECCC 

SEQ RTVNKVCFHPTEAHVLLSGSQDGFMKCFDLRRKDSVSTFSGQSESVRDVQFSIRDYFTFA 

SEG 

IgotB CCEEEEEEETT-TCEEEEEETTTEEEEEETTTTEEEEEECBTTCCEEEEEETTTTTEEEE 

SEQ S T F ENGN VQLW D I RR P D RC E RM FT AH NGPVFCCDWHPED RGWLAT GGRDKM V K V WDMTT H 

SEG 

IgotB E-ETTTEEEEEETTTTEEEE-EEECCCCCEEEEEE-TTTTCCEEEEETTTEEEEEC 

SEQ RAKEMHCVQTI ASVARVKWRPECRHHLATCSMMVDHNI YVWDVRRPFVPAAMFEEHRDVT 

SEG 

IgotB 

SEQ TGIAWRHPHDPSFLLSGSKDSSLCQHLFRDASQPVERANPEGLCYGLFGDLAFAAKESLV 

SEG 

IgotB 

SEQ AAESGRKPYTGDRRHPIFFKRKLDPAEPFAGLASSALSVFETEPGGGGMRWFVDTAERYA 

SEG 

IgotB 

SEQ LAGRPLAELCDHNAKVARELGRNQVAQTWTMLRIIYCSPGLVPTANLNHSVGKGGSCGLP 

SEG 

IgotB 

SEQ LMNSFNLKDMAPGLGSETRLDRSKGDARSDTVLLDSSATLITNEDNEETEGSDVPADYLL 

SEG xxxx 



SEQ GDVEGEEDELYLLDPEHAHPEDPECVLPQEAFPLRHEIVDTPPGPEHLQDKADSPHVSGS 

SEG xxxxxxxxxxxxxx 

IgotB 

SEQ EADVASLAPVDSSFSLLSVSHALYDSRLPPDFFGVLVRDMLHFYAEQGDVQMAVSVLIVL 

SEG 

IgotB 

SEQ GERVRKDI DEQTQEHWYTS YI DLLQRFRLWNVSNEWKLSTSRAVSCLNQASTTLHVNCS 

SEG 

IgotB 

SEQ HCKRPMSSRGWVCDRCHRCASMCAVCHHVVKGLFVWCQGCSHGGHLQHIMKWLEGSSHCP 

SEG 

IgotB 

SEQ AGCGHLCEYS 

SEG 

IgotB ; .. 



IgotB 



Prosite for DKFZphtes3_4f 5 . 3 



PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



74->78 
468->472 
691->695 
718->722 

69->73 
152->156 

17->20 
165->168 
172->175 
239->242 
364->367 
701->704 



ASN GLYCOSYLATION 
ASNJ3LYCOSYLATION 
AS NJ3LYCOS YLAT I ON 
AS N_GL YCOS Y L AT ION 
CAMP_PHOS PHO_S ITE 
CAMP PHOSPHO SITE 
PKC_PHOS?HO_SITE 
PKC PHOSPHO_SITE 

pkc~phospho_site 
pkc~phospho_site 
pkc phosphors ite 
pkc~phospho_site 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 


727->730 


PKC PHOSPHO 


SITE 




PS00006 


76->80 


CK2 PHOSPHO SITE 


r UUCUUUUO 


PS00006 


165->169 


CK2 PHOSPHO~SITE 


ruutuuuuo 


PS00006 


172->176 


CK2 PHOSPHO 


SITE 


rUULUUUU D 


PS00006 


181->185 


CK2 PHOSPHO* 


"site 


rUlA-UUUUD 


PS00006 


398->402 


CK2~PHOSPHO~ 


'site 


f UULUUUUo 


PS00006 


498->502 


CK2~PHOSPHO 


"site 


PDOCUUUUO 


PS00006 


503->507 


CK2 PHOSPHO" 


"site 


FUOCUUUU b 


PS00006 


522->526 


CK2 PHOSPHO" 


"site 


PDOCUUUUO 


PS00006 


598->602 


CK2 PHOSPHO SITE 


PDOCUUUUO 


PS00006 


600->604 


CK2 PHOSPHO~SITE 


PDOCUUUUo 


PS00006 


679->683 


CK2 PHOSPHO SITE 


n r\ft/*rtAArt C 

PDOCUUUUo 


PS00007 


337->346 


TYR PHOSPHORITE 


PUUCUUUU / 


PS00008 


13->19 


MYRISTYL 


PDQCUUOUo 


PS00008 


97->103 


MYRISTYL 




PDOC00008 


PS00008 


139->145 


MYRISTYL 




PDOC00008 


PS00008 


161->167 


MYRISTYL 




PDOC00008 


PS00008 


317->323 


MYRISTYL 




PDOCU0008 


PS00008 


342->348 


MYRISTYL 




PDOC00008 


rSUUUUo 




MYRISTYL 




PDOC00008 


PS00008 


460->466 


MYRISTYL 




PDOC00008 


PS00008 


474->480 


MYRISTYL 




PDOC00008 


PS00008 


759->765 


MYRISTYL 




PDOC00008 


PS00009 


67->71 


AMI DAT I ON 




PDOC00009 


PS00009 


364->368 


AMI DAT ION 




PDOC00009 


PS00190 


743->749 


CYTOCHROME C 


PDOC00169 


PS00678 


90->105 


WD REPEATS 




PDOC00574 


PS00678 


223->238 


WD REPEATS 




PDOC00574 


PS00678 


269->284 


WD REPEATS 




PDOC00574 



Pfam for DKFZphtes3_4f 5 . 3 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFTvSGSWDgTCRLWD* 

++ HN++V C+ ++P+ R +++G++D+ +++WD 
Query 203 FTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWD 236 
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DKFZphtes3_4h6 



group: intracellular transport/trafficking 

DKFZphtes3_4h6 encodes a novel 622 amino acid protein with strong similarity to the kinesin 
light chain. 

Kinesin is a microtubule-based motor protein that pulls vesicles or organelles towards the 
plus end of microtubules. Structural changes in the protein that drive motility are coupled to 
ATP binding and hydrolysis. The novel protein is similar to kinesin light chain, which is part 
of the functional kinesin holoenzyme tetrameric protein. The light chain has been proposed to 
function in coupling of cargo to the heavy chain or in the modulation of the ATPase activity 
of the heavy chain. The novel protein contains two kinesin light chain repeats and one RGD 
cell-attachment site. 

The novel kinesin protein can find application in modulating the function of kinesin and 
modulating intracellular transport via/on microtubules. 



strong similarity to Kinesin light chain 

complete cDNA, complete cds, start at 150, EST hits (few) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2992 bp 

Poly A stretch at pos. 2914, polyadenylation signal at pos. 2893 



1 GGCGGGATGG AGGCGGCGGG ACCGGCTCGC GGGTGCGGGT CCGGGTGAAG 
51 CGGGAGGCAG CCAGAGTCGG AGCCGGGCCC GAGCACCAGG CGCAGGCCCG 
101 GCGCCCGCCT GCCCGCACCC TCGTCCTCAC AGACGCCACA GCCATGGCCA 
151 TGATGGTGTT TCCGCGGGAG GAGAAGCTGA GCCAGGATGA GATCGTGCTG 
201 GGCACCAAGG CTGTCATCCA GGGACTGGAG ACTCTGCGTG GGGAGCATCG 
251 TGCCCTGCTG GCTCCTCTGG TTGCACCTGA GGCCGGCGAA GCCGAGCCTG 
301 GCTCGCAGGA GCGCTGCATC CTCCTGCGTC GCTCCCTGGA AGCCATTGAG 
351 CTTGGGCTGG GGGAGGCCCA GGTGATCTTG GCATTGTCGA GCCACCTGGG 
401 GGCTGTAGAA TCAGAGAAGC AGAAGCTGCG GGCGCAGGTG CGGCGTCTGG 
451 TGCAGGAGAA CCAGTGGCTG CGTGAGGAGC TGGCGGGGAC ACAGCAGAAG 
501 CTGCAGCGCA GTGAGCAGGC CGTGGCCCAG CTCGAGGAGG AGAAGCAGCA 
551 CTTGCTGTTC ATGAGCCAGA TCCGCAAGTT GGATGAAGAC GCCTCCCCTA 
601 ACGAGGAGAA GGGGGACGTC CCCAAAGACA CACTGGATGA CCTGTTCCCC 
651 AATGAGGATG AGCAGAGCCC AGCCCCTAGC CCAGGAGGAG GGGATGTGTC 
701 TGGTCAGCAT GGGGGCTACG AGATCCCGGC CCGGCTCCGC ACCCTGCACA 
751 ACCTGGTGAT CCAATACGCC TCACAGGGCC GCTACGAGGT AGCTGTGCCA 
801 CTCTGCAAGC AGGCACTCGA AGACCTGGAG AAGACGTCAG GCCACGACCA 
851 CCCTGACGTT GCCACCATGC TGAACATCCT GGCACTGGTC TATCGGGATC 
901 AGAACAAGTA CAAGGAGGCT GCCCACCTGC TCAATGATGC TCTGGCCATC 
951 CGGGAGAAAA CACTGGGCAA GGACCACCCA GCCGTGGCTG CGACACTAAA 
1001 CAACCTGGCA GTCCTGTATG GCAAGAGGGG CAAGTACAAG GAGGCTGAGC 
1051 CATTGTGCAA GCGGGCACTG GAGATCCGGG AGAAGGTCCT GGGCAAGTTT 
1101 CACCCAGATG TGGCCAAGCA GCTCAGCAAC CTGGCCCTGC TGTGCCAGAA 
1151 CCAGGGCAAA GCTGAGGAGG TGGAATATTA CTATCGGCGG GCACTGGAGA 
1201 TCTATGCTAC ACGCCTCGGG CCCGATGACC CCAATGTGGC CAAGACCAAG 
1251 AACAACCTGG CTTCCTGCTA CCTGAAGCAG GGCAAGTACC AGGATGCGGA 
1301 GACCTTGTAC AAGGAGATCC TCACCCGCGC TCATGAGAAA GAGTTTGGCT 
1351 CTGTCAATGG GGACAACAAG CCCATCTGGA TGCACGCAGA GGAGCGGGAG 
1401 GAAAGCAAGG ATAAGCGCCG GGACAGCGCC CCCTATGGGG AATACGGCAG 
1451 CTGGTACAAG GCCTGTAAAG TAGACAGCCC CACAGTCAAC ACCACCCTGC 
1501 GCAGCTTGGG GGCCCTATAC CGGCGCCAGG GCAAGCTGGA AGCCGCGCAC 
1551 ACACTAGAGG ACTGTGCCAG CCGTAACCGC AAGCAGGGTT TGGACCCCGC 
1601 AAGCCAGACC AAGGTGGTAG AACTGCTGAA AGATGGCAGT GGCAGGCGGG 
1651 GAGACCGCCG CAGCAGCCGA GACATGGCTG GGGGTGCCGG GCCTCGGTCT 
1701 GAGTCTGACC TCGAGGACGT GGGACCTACA GCTGAGTGGA ATGGGGATGG 
17 51 CAGTGGCTCC TTGAGGCGCA GCGGTTCCTT TGGGAAACTC CGGGATGCCC 
1801 TGAGGCGCAG CAGTGAGATG CTGGTAAAGA AGCTGCAGGG GGGCACCCCC 
1851 CAGGAGCCCC CTAACCCCAG GATGAAGCGG GCCAGTTCCC TCAACTTCCT 
1901 CAACAAGAGC GTGGAAGAGC CGACCCAGCC TGGAGGCACA GGTCTCTCTG 
1951 ACAGCCGCAC TCTCAGCTCC AGCTCCATGG ACCTCTCCCG ACGAAGCTCC 
2001 CTGGTGGGCT AATGCTGAAG GGGCAGCCAG TCACCAGAGC GCCCACCTGG 
2051 CACACCCCCC TCACCCCAGC CCTGCGCATG GGCCTGCTGC TTGTCCCGCC 
2101 TGTCTCTCCC ACAGCCCCTG TCTTTTCTGT TCAATCTCAG GGTAACCTTC 
2151 TCCCTTGTCA TCTCAGCCTG AGCCCTGGAG GCTGGGCCTG CCCACTCCAG 
2201 CTCCATCCCT TATTTATTCC TTCCAGCAGG GCCCTCTTCC CTAGGTTCGG 
2251 GCCAGCAGGA GGTGCCGGCT GGAGTCTCCA CCATAGACTC AGTGGCCTGG 
2301 CCTCCCCAGA CCCCAGAGCC AAGAACACTA AGCACTCGCC GGCCCTTCGG 
2351 CACCCTCGCC CTCCCTCCCG ACTCAACCCG GCCGTTGCTT CTGTATATAG 
2401 AGAAATAAGT TATTGGCCGC GCGCCTCCCT TCAGTCCACG GTACTACCCG 
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24 51 GGCCTCCCCT CGTCCCTCTT CTAGTGGTAC CGCCCAGGCC TTAATCACCC 

2501 CCATTCCGTG CGGTGGTATC TCCCAGGCTC TACATTCTCG GGAGCGGCGC 

2551 CTCCCAAGGG GGTCCTGGGA CCTTCTCGCG CTCCTCCTGG CCTCTGAGGG 

2601 ATGCGTCCTA CCCGCGCCAT CGCCCCGTGG CCCAGGACGG GGACCTCCCC 

2651 TTAGTCCGTC CTCCCACCGC CGGGCCCTGC CCCGCATCCC GGCCTTATGC 

2701 ACTGCCCCTC CCACCCGGCC CCGCCCAGGC ACGGCCGACC CCGCCCCGGG 

2751 CACCGCCCAC CGAGCCATCC TGCCTCGCCT CCCCCCACGC CTGCAGCTTC 

2801 TCGCGAGGGG CGGCGACGGT CCCCTGGTGG CAGGAGGGGC TCCCCCTGTT 

2851 GCGGGTGAGG CGGCTGCTCT CTATTTTCAG ATGTTGCTGT AGAAATAAAG 

2901 ACGGTTTAAA TCTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



98288268: 

Two kinesin light chain genes in mice. Identification and 
characterization of the encoded proteins. 



Peptide information for frame 3 



ORF from 144 bp to 2009 bp; peptide length: 622 
Category: strong similarity to known protein 
Prosite motifs: RGD (502-505) 
KINESIN LIGHT (223-265) 
KINESIN~LIGHT (265-307) 



1 MAMMVFPREE KLSQDEIVLG TKAVIQGLET LRGEHRALLA PLVAPEAGEA 
51 EPGSQERCIL LRRSLEAIEL GLGEAQVILA LSSHLGAVES EKQKLRAQVR 
101 RLVQENQWLR EELAGTQQKL QRSEQAVAQL EEEKQHLLFM SQIRKLDEDA 
151 SPNEEKGDVP KDTLDDLFPN EDEQSPAPSP GGGDVSGQHG GYEIPARLRT 
201 LHNLVIQYAS QGRYEVAVPL CKQALEDLEK TSGHDHPDVA TMLNILALVY 
251 RDQNKYKEAA HLLNDALAIR EKTLGKDHPA VAATLNNLAV LYGKRGKYKE 
301 AEPLCKRALE IREKVLGKFH PDVAKQLSNL ALLCQNQGKA EEVEYYYRRA 
351 LEIYATRLGP DDPNVAKTKN NLASCYLKQG KYQDAETLYK EILTRAHEKE 
401 FGSVNGDNKP IWMHAEEREE SKDKRRDSAP YGEYGSWYKA CKVDSPTVNT 
4 51 TLRSLGALYR RQGKLEAAHT LEDCASRNRK QGLDPASQTK WELLKDGSG 
501 RRGDRRSSRD MAGGAGPRSE SDLEDVGPTA EWNGDGSGSL RRSGSFGKLR 
551 DALRRSSEML VKKLQGGTPQ EPPNPRMKRA SSLNFLNKSV EEPTQPGGTG 
601 LSDSRTLSSS SMDLSRRSSL VG 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4h6, frame 3 

TR£MBL:AF055666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds . , N = 1, Score 
= 2824, P - 4e-294 

PIR: 153013 kinesin light chain - human, N - 1, Score * 1927, P *» 
4.5e-199 

PIR:C41539 kinesin light chain C - rat, N - 1, Score - 1919, P = 
3.2e-198 

SWISSPROT:KNLC RAT KINESIN LIGHT CHAIN (KLC) . , N » 1, Score - 1919, P = 
3.2e-198 



>TR£M3L:AF055666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds. 
Length = 599 

HSPs: 
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Score = 2824 (423.7 bits), Expect - 4.0e-294, P = 4.0e-294 
Identities = 558/598 (93%), Positives - 572/598 (95%) 

Query: 1 MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 60 

MA MV PREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPL + EAGEAEPGSQERC+L 
Sbjct: 1 MATMVLPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLASHEAGEAEPGSQERCLL 60 

Query: 61 L RRS L EA I ELGLGEAQV I L AL S S HLGA VES E KQKL RAQVRRL VQEN QWLKEE L AGTQQKL 120 

L RR S L EA I ELG LGEAQV I L AL S S HLGA V ES EKQK L RAQVRRLVQ EN QWLRE E L AGTQQKL 
Sbjct: 61 LRRS LEA I ELGLGEAQVI LALSSHLGA VES EKQKLRAQVRRLVQENQWLREEL AGTQQKL 120 

Query: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 180 

QRSEQAVAQLEEEKQHLLFKSQIRKLDE P EEKGDVPKD+LDDLFPNEDEQS PAPSP 
Sbjct: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDE-MLPQEEKGDVPKDSLDDLFPNEDEQS PAPSP 179 

Query: 181 GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 240 

GGGDV+ QHGGYEI PARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 
Sbjct: 180 GGG D V AAQHGG Y E I P ARL RT LHN LV I QYAS QGR Y E VA V P LC KQALE DL E KTSGHDHP DV A 239 

Query: 241 TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 300 

TMLNILALVYRDQNKYK+AAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 
Sbjct: 240 TMLNILALVYRDQNKYKDAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 299 

Query: 301 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 360 

AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 
Sbjct: 300 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 359 

Query: 361 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 420 

DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNG+NKPIWMHAEEREE 
Sbjct: 360 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGENKPIWMHAEEREE 419 

Query: 421 SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 480 

SKDKRRD P EYGSWYKACKVDSPTVNTTLR+LGALYR +GKLEAAHTLEDCASR+RK 
Sbjct: 420 SKDKRRDRRPM-EYGSWYKACKVDSPTVNTTLRTLGALYRPEGKLEAAHTLEDCASRSRK 478 

Query: 481 QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 540 

QGLDPASQTKWELLKDGSGR G RR SRD+AG P+SESDLE+ GP AEW+GDGSGSL 
Sbjct: 479 QGLDPASQTKVVELLKDGSGR-GHRRGSRDVAG PQSESDLEESGPAAEWSGDGSGSL 534 

Query: 541 RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGG 598 

RRSGSFGKLRDALRRSSEMLV+KLQGG PQEP N RMKRASSLNFLNKSVEEP QPGG 
Sbjct: 535 RRSGSFGKLRDALRRSSEMLVRKLQGGGPQEP-NSRMKRASSLNFLNKSVEEPVQPGG 591 

Pedant information for DKFZphtes3_4h6, frame 3 

Report for DKFZphtes3_4h6.3 



(LENGTH] 622 

[MW] 68934.82 

[pi] 6.72 

[HOMOL] TR£MBL:AF055666_1 gene: rt Klc2 w ; product: "kinesin light chain 2"; Mus musculus 
kinesin light chain 2 (Klc2) mRNA, complete cds. 0.0 

[BLOCKS] BL00927C Trehalase proteins 

[BLOCKS] BL01160I Kinesin light chain repeat proteins 

[BLOCKS] BL01160H Kinesin light chain repeat proteins 

[BLOCKS] BL01160G Kinesin light chain repeat proteins 

(BLOCKS] BL01160F Kinesin light chain repeat proteins 

[BLOCKS] BL01160E Kinesin light chain repeat proteins 

(BLOCKS] BL01160D Kinesin light chain repeat proteins 

[BLOCKS] BL01160C Kinesin light chain repeat proteins 

[BLOCKS) BL01160B Kinesin light chain repeat proteins 

[BLOCKS] BL01160A Kinesin light chain repeat proteins 

[SUPFAM] tetratricopeptide repeat homology le-07 

[PROSITE] RGD 1 

[PROSITEJ MYRISTYL 8 

(PROSITE) KINESIN_LIGHT 2 

[ PROSITE) AMI DAT ION 2 

[PROSITE] CAMP_PHOSPHO_SITE 5 

[PROSITE] CK2_PHOSPHO_SITE 11 

[ PROSITE) TYR PHOSPHO_SITE 3 

[PROSITE] PKC"PH0SPHO_SITE 7 

( PROSITE] ASN GLYCOSYLATION 2 

(PFAM) Kinesin light chain repeat 

[KW] All Alpha 

[KW] LOVTCOMPLEXITY 12.54 % 

[ KW] COILED_COIL 4.98 % 
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SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
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PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



449->453 
587->591 
425->429 
505->509 
S54->558 
578->582 
616->620 
30->33 
90->93 
451->454 
499->502 
5O7->510 
539->542 
615->618 
13->17 
151->155 
163->167 
232->236 
470->474 
507->511 
519->523 
521->525 



asm glycosylation 
asn'glycosylation 
camp_phos pho_s ite 
camp_phospho_site 
camp_phos pho~s i te 
camp_phospho_site 
camp phosphors ite 
pkc_phospho site 

PKC PHOSPHORITE 
PRC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2 PHOSPHO SITE 
CK2~PHOSPHO~SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO_SITE 
CK2~PHOSPHO_SITE 
CK2 PHOSPHO_SITE 
CK2 PHOSPHORITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00006 


568->572 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


589->593 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


610->614 


CK2_PH0SPH0_SITE 


PDOCUOUuc 


PS00007 


339->346 


TYR~PHOSPHO~SITE 


PDOC00007 


PS00007 


339->347 


T YR_PHOS PHO~S I TE 


PDOC00007 


PS00007 


424->432 


TYR~PHOSPHO_SITE 


PDOC00007 


PS00008 


71->77 


MYRISTYL 


PDOC00008 


PS00008 


86->92 


MYRISTYL 


PDOC00008 


PS00008 


182->188 


MYRISTYL 


PDOC00008 


o conn n a 


10 f — * 1 :* J 


MVDTCTVT 


PDOC00008 


PS00008 


402->408 


MYRISTYL 


PDOC00008 


PS00008 


4B2->488 


MYRISTYL 


PDOC00008 


PS00008 


598->604 


MYRISTYL 


PDOC00008 


PS00008 


600->606 


MYRISTYL 


PDOC000Q8 


PS00009 


292->296 


AMI DAT I ON 


PDOC00009 


PS00009 


499->503 


AMIDATION 


PDOC00009 


PS00016 


502->505 


RGD 


PDOC00016 


PS01160 


223->265 


KINESIN LIGHT 


PDOC00893 


PS01160 


265->307 


KINESIN LIGHT 


PDOC00893 
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HMM_NAME Kinesin light chain repeat 

HMM *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 

+ALED+EKT+GHDHPDVATMLN+LALV+R+QNKY+E++ ++N 
Query 223 QALEDLEKTSGHDHPDVATMLNILALVYRDQNKYKEAAHLLN 264 

50.46 265 306 1 42 dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alignment to HMM consensus: 
Query *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
AL +REKTLG DHP VA LNNLA+++ ++KY+E+E + + 

dkfzphtes3 265 DALAI REKTLGKDH PAVAATLNNLA VLYGKRGK YKEAEPLC K 306 

Query 348 1 42 dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alignment to HMM consensus: 
HMM *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 

RALE+REK+LG HPDVA++L+NLAL+C+NQ+K EEVE YY+ 
Query 307 RALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYR 348 

39.10 349 390 1 42 dkf 2phtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alignment to HMM consensus: 
Query * RALE DREKtlGHDHP DV A tMLNNLA L vC RNQNK Y e Eve N Y YN * 

RALE+ LG D P+VA+ NNLA + Q+KY+++E +Y+ 

dk£2phtes3 349 RALEIYATRLGPDDPNVAKTKNNLASCYLKQGKYQDAETLYK 390 
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DKFZphtes3_4ol9 



group: testes derived 

DKFZphtes3_4ol9 encodes a novel 1180 amino acid protein with weak similarity to human 
megakaryocyte stimulating factor and human mucin. 

The novel protein contains a cytochrome c family heme-binding site signature. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to megakaryocyte stimulating factor and mucin 

complete cDNA, complete cds, EST hits (few) 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3767 bp 

Poly A stretch at pos. 3757, polyadenylation signal at pos. 3737 

1 GGCTAGGTTT AGCTTCAGGG GCAGCCCAGG GCAGTGTTGC TGCATATTGC 
51 ATGGATGAAA GGCTGAAGGC TGCCTCCTCT TGCAGGCTGG CTTCTGAGAT 

101 TGCACCTTCT TCTCCTGCTA CTCCTCCAAA TCTATGACCC TTCAAGGCAG 

151 AGCTGACCTG TCCGGTAATC AAGGCAATGC AGCCGGCCGC CTAGCTACAG 

201 TTCACGAGCC AGTTGTCACC CAGTGGGCGG TGCATCCTCC AGCCCCCGCT 

251 CACCCCAGTC TCCTGGACAA AATGGAGAAA GCGCCTCCAC AGCCCCAGCA 

301 CGAGGGCCTC AAGTCCAAGG AGCATCTTCC GCAACAGCCT GCCGAAGGCA 

351 AGACGGCGTC CCGCCGCGTC CCACGCCTCC GGGCTGTGGT CGAGAGCCAG 

401 GCCTTCAAGA ACATCCTGGT AGACGAGATG GACATGATGC ACGCCCGTGC 

451 AGCCACGCTC ATCCAAGCCA ACTGGAGGGG CTATTGGCTC CGGCAGAAGC 

501 TGATTTCCCA GATGATGGCG GCCAAGGCCA TCCAGGAGGC CTGGCGGCGC 

551 TTCAACAAGA GACACATCCT TCACTCCAGC AAGTCGTTGG TAAAGAAAAC 

601 GAGGGCGGAG GAGGGGGACA TACCTTATCA CGCCCCACAG CAGGTGCGCT 

651 TCCAGCATCC GGAAGAGAAC CGCCTTCTGT CCCCGCCCAT CATGGTGAAC 

701 AAGGAGACCC AGTTCCCTTC CTGTGACAAT CTGGTCCTCT GCAGACCCCA 

751 GTCGTCCCCC CTCCTGCAGC CCCCAGCAGC TCAGGGTACC CCAGAGCCCT 

801 GTGTGCAGGG TCCTCATGCT GCCAGAGTCC GGGGGCTGGC CTTCCTGCCA 

851 CACCAGACGG TCACCATCAG ATTTCCCTGC CCAGTGAGTT TGGACGCAAA 

901 ATGCCAGCCA TGCCTGCTGA CCAGAACCAT CAGAAGCACC TGCCTCGTCC 

951 ACATAGAGGG TGACTCAGTG AAGACCAAAC GTGTAAGTGC CCGGACCAAC 
1001 AAAGCCAGGG CTCCGGAGAC ACCATTGTCC AGAAGGTATG ACCAGGCAGT 
1051 TACGAGACCA TCCAGAGCCC AAACCCAGGG CCCTGTGAAA GCAGAGACCC 
1101 CCAAAGCCCC CTTCCAGATA TGTCCAGGGC CCATGATCAC CAAGACTCTA 
1151 CTCCAGACAT ATCCAGTGGT CTCCGTGACC CTGCCACAGA CATATCCAGC 
1201 GTCCACGATG ACCACCACCC CACCCAAGAC TAGCCCAGTT CCCAAAGTAA 
12 51 CAATAATCAA GACCCCAGCC CAGATGTATC CGGGGCCCAC AGTGACCAAA 
1301 ACTGCACCTC ACACATGCCC CATGCCCACA ATGACCAAGA TCCAGGTACA 
1351 CCCCACAGCC TCCAGAACTG GCACCCCACG GCAGACATGC CCTGCGACCA 
1401 TCACGGCAAA GAACCGACCT CAGGTTTCCC TTCTGGCTTC CATCATGAAG 
1451 AGCCTGCCCC AGGTATGCCC GGGGCCTGCG ATGGCAAAGA* CCCCACCCCA 
1501 GATGCACCCG GTCACCACCC CAGCCAAAAA CCCATTGCAA ACATGTCTGT 
1551 CAGCCACAAT GTCCAAGACT TCATCCCAGA GGAGCCCAGT TGGGGTGACC 
1601 AAGCCCTCAC CCCAGACCCG CCTGCCAGCC ATGATAACCA AGACCCCAGC 
1651 CCAGTTACGC TCGGTGGCCA CCATCCTCAA GACTCTGTGT CTGGCCTCTC 
1701 CAACAGTGGC AAATGTCAAG GCTCCACCCC AAGTGGCGGT AGCAGCCGGA 
17 51 ACTCCCAACA CCTCAGGCTC CATCCATGAG AACCCACCCA AGGCCAAGGC 
1801 CACCGTGAAT GTGAAGCAGG CTGCAAAGGT GGTGAAAGCC TCATCCCCCT 
1851 CCTATTTGGC TGAGGGGAAG ATCAGGTGCC TGGCTCAACC ACATCCGGGA 
1901 ACTGGGGTCC CCAGGGCTGC AGCTGAGCTT CCTTTGGAAG CCGAGAAAAT 
1951 CAAGACTGGC ACCCAGAAAC AGGCGAAAAC AGACATGGCA TTTAAGACCA 
2001 GTGTGGCAGT GGAAATGGCT GGGGCTCCAT CCTGGACAAA AGTTGCTGAG 
2051 GAAGGGGACA AGCCACCTCA CGTGTATGTG CCTGTAGACA TGGCTGTCAC 
2101 CCTGCCCCGG GG AC AGCTGG CTGCCCCACT GACCAATGCC TCATCCCAGA 
2151 GACATCCACC CTGCCTGTCC CAGAGACCAC TGGCCGCCCC GCTGACCAAG 
2201 GCCTCATCTC AGGGACATCT GCCCACTGAG CTGACCAAGA CCCCATCCCT 
2251 GGCCCATCTG GACACCTGTC TGAGCAAGAT GCATTCCCAG ACACATCTGG 
2301 CCACAGGTGC CGTGAAGGTC CAGTCCCAAG CGCCTCTAGC CACCTGTCTG 
2351 ACCAAGACGC AGTCCCGGGG GCAGCCGATC ACAGACATAA CCACGTGCCT 
2401 CATCCCAGCG CACCAGGCTG CTGATCTCAG CAGCAACACC CACTCCCAGG 
24 51 TGCTCCTAAC AGGGTCCAAG GTGTCCAACC ACGCCTGCCA GCGCCTCGGT 
2501 GGCCTCAGCG CCCCACCCTG GGCCAAGCCA GAGGACAGAC AGACCCAGCC 
2551 ACAGCCCCAC GGACACGTGC CGGGGAAGAC CACTCAGGGG GGACCATGCC 
2601 CGGCAGCCTG TGAGGTCCAG GGTATGCTGG TGCCGCCGAT GGCACCCACC 
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GGCCATTCCA CATGCAACGT TGAGTCCTGG GGAGACAACG GAGCCACACG 
TGCCCAGCCA TCAATGCCCG GCCAGGCGGT GCCCTGCCAG GAGGACACGG 
GCCCCGCGGA CGCTGGTGTG GTTGGTGGCC AATCGTGGAA CCGCGCATGG 
GAGCCAGCCA GGGGTGCTGC GTCCTGGGAC ACCTGGCGCA ACAAGGCGGT 
GGTGCCTCCC AGGCGGTCCG GGGAGCCAAT GGTGTCCATG CAGGCTGCAG 
AGGAGATCCG CATCCTCGCA GTGATCACTA TCCAGGCGGG CGTCCGTGGC 
TACCTGGCGC GTCGCAGGAT CCGGCTGTGG CACCGGGGGG CCATGGTCAT 
CCAAGCTACT TGGCGCGGCT ACCGTGTGCG GCGGAACCTG GCACACCTCT 
GCAGAGCCAC CACGACCATC CAGTCTGCCT GGCGCGGCTA CAGCACCCGC 
CGGGACCAAG CCCGGCACTG GCAGATGCTC CACCCCGTCA CGTGGGTGGA 
GCTGGGCAGC CGGGCCGGGG TCATGTCTGA CCGAAGCTGG TTCCAGGATG 
GCAGAGCCAG GACAGTATCT GACCATCGCT GCTTCCAGTC CTGCCAGGCA 
CACGCTTGCA GCGTCTGCCA CTCCCTGAGC TCCAGGATCG GGAGCCCGCC 
CAGCGTGGTG ATGCTAGTGG GCTCCAGCCC TCGCACCTGT CATACCTGTG 
GACGCACACA GCCCACCCGT GTGGTGCAGG GCATGGGCCA GGGCACTGAG 
GGCCCCGGGG CAGTGTCTTG GGCCTCCGCC TACCAGCTGG CTGCCCTGAG 
TCCCAGGCAG CCGCATCGCC AGGACAAAGC GGCCACAGCC ATCCAGTCCG 
CCTGGAGGGG CTTTAAGATC CGCCAGCAGA TGAGGCAGCA GCAAATGGCA 
GCGAAGATAG TTCAAGCCAC CTGGCGAGGC CACCATACCC GGAGCTGTCT 
GAAGAACACA GAGGCGCTCT TGGGACCAGC AGACCCCTCG GCCAGCTCAC 
GGCACATGCA TTGGCCTGGC ATCTAGGACC CTGGCTCCCT GCAGTGGGGA 
CTTCGTGGGA GGCACTCATG GCTCTCTGGG TCTAATGAAT AAAGTCCTCC 
ACAGCCTAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3673 bp; peptide length: 1180 
Category: similarity to known protein 



1 MTLQGRADLS GNQGNAAGRL ATVHEPVVTQ WAVHPPAPAH PSLLDKMEKA 
51 PPQPQHEGLK SKEHLPQQPA EGKTASRRVP RLRAVVESQA FKNILVDEMD 
101 MMHARAATLI QANWRGYWLR QKLISQMMAA KAIQEAWRRF NKRHILHSSK 
151 SLVKKTRAEE GDIPYHAPQQ VRFQHPEENR LLSPPIMVNK ETQFPSCDNL 
201 VLCRPQSSPL LQPPAAQGTP EPCVQGPHAA RVRGLAFLPH QTVTIRFPCP 
251 VSLDAKCQPC LLTRTIRSTC LVHIEGDSVK TKRVSARTNK ARAPETPLSR 
301 RYDQAVTRPS RAQTQGPVKA ETPKAPFQIC PGPMITKTLL QTYPVVSVTL 
351 PQTYPASTMT TTPPKTSPVP KVTIIKTPAQ MYPGPTVTKT APHTCPMPTM 
401 TKIQVHPTAS RTGTPRQTCP ATITAKNRPQ VSLLASIMKS LPQVCPGPAM 
4 51 AKTPPQMHPV TTPAKNPLQT CLSATMSKTS SQRSPVGVTK PSPQTRLPAM 
501 ITKTPAQLRS VATILKTLCL ASPTVANVKA PPQVAVAAGT PNTSGSIHEN 
551 PPKAKATVNV KQAAKVVKAS SPSYLAEGKI RCLAQPHPGT GVPRAAAELP 
601 LEAEKIKTGT QKQAKTDMAF KTSVAVEMAG APSWTKVAEE GDKPPHVYVP 
651 VDMAVTLPRG QLAAPLTNAS SQRHPPCLSQ RPLAAPLTKA SSQGHLPTEL 
701 TKTPSLAHLD TCLSKMHSQT HLATGAVKVQ SQAPLATCLT KTQSRGQPIT 
751 DITTCLIPAH QAADLSSNTH SQVLLTGSKV SNHACQRLGG LSAPPWAKPE 
301 DRQTQPQPHG HVPGKTTQGG PCPAACEVQG MLVPPMAPTG HSTCNVESWG 
851 DNGATRAQPS MPGQAVPCQE DTGPADAGVV GGQSWNRAWE PARGAASWDT 
901 WRNKAVVPPR RSGEPMVSMQ AAEEIRILAV ITIQAGVRGY LARRRIRLWH 
951 RGAMVIQATW RGYRVRRNLA HLCRATTTIQ SAWRGYSTRR DQARHWQMLH 
1001 PVTWVELGSR AGVMSDRSWF QDGRARTVSD HRCFQSCQAH ACSVCHSLSS 
1051 RIGSPPSVVM LVGSSPRTCH TCGRTQPTRV VQGMGQGTEG PGAVSWASAY 
1101 QLAALSPRQP HRQDKAATAI QSAWRGFKIR QQMRQQQMAA KIVQATWRGH 
1151 HTRSCLKNTE ALLGPADPSA SSRHMHWPGI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4ol9, frame 2 

TREM3L:HSU70136_1 produce: "megakaryocyte stimulating factor 1 '; Human 
megakaryocyte stimulating factor mRNA, complete cds., N *» 2, Score - 
242, P - 9.6e-16 
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TREMBL : HSMUC2 A_l gene: n MVC2 n ; product: "mucin"; Human mucin-2 gene, 
partial cds., N - 1, Score = 204, P - 1.4e-12 

PIR:S48478 glucan 1 , 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
{Saccharomyces cerevisiae) , H - 1, Score = 192, P - 9.6e-ll 



>TREMBL:HSU70136_1 product: "megakaryocyte stimulating factor " 
megakaryocyte stimulating factor mRNA, complete cds . 
Length = 1, 404 



Human 



HSPs: 

Score « 242 (36.3 bits), Expect = 9.6e-16, Sum P(2) = 
Identities = 145/546 (26%), Positives - 198/546 (36%) 



9.6e-16 



Query: 282 KRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAETPKAPFQIC-PGPMITKTLL 340 

K+ + T K AP TP PS + P T AP P P TK+ 

Sbjct: 488 KKPAPTTPKEPAPTTP-KEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAP 546 

Query: 341 QTYPVVSVTLPQ TYPASTMTTTPPKTSPV-PKVTIIKTPAQMYPGPTVTKTAPHTC 395 

T S T + TP TTP K +P PK TP + P PT TK 
Sbjct: 547 TTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKE — PAPTTTKK 599 

Query: 396 PMPTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPP 455 

P PT K + PT TP++T P T LA P +A T P 

Sbjct: 600 PAPTAPK-EPAPT TPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTP 653 

Query: 456 QMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVTKPSPQT-RLPAMIT-KTPAQLRSVAT 513 

+ TTP + P T A T + +P +P+P T + PA T K A T 
Sbjct: 654 EEPTPTTP-EEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKEPAPTTPKETAPTTPKGT 712 

Query: 514 ILKTLCLASPTVANVKAPPQVAVAAG TPNTSGSIHENPPKAKATVNVKQAAKW-KA 569 

TL +PT AP ++A T TS PK A K+ A K 

Sbjct: 713 APTTLKEPAPTTPKKPAPKELAPTTTKEPTSTTSDKPAPTTPKGTAPTTPKEPAPTTPKE 772 

Query: 570 SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGT — QKQAKTDMAFKTSVAVE 627 

+p+ L+PPT A EL KTT KAT +T+ 

Sbjct: 773 PAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTTTKGPTSTTSDKPAPTTPK-ETAPTTP 831 

Query: 628 MAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAAPL 687 

AP+ K+ PPV+P ♦ S PLSPL 

Sbjct: 832 KEPAPTTPK--KPAPTTPETPPPTTSEVSTPTTTKEPTTIHKSPDESTPELSAEPTPKAL 889 

Query: 688 TKASSQGHLPTELTKTPSLA— HLDTCLSKMHSQTHLATGAVKVQSQAPLAT— CLTKTQ 743 

+ + +PT TKTP+ + T ++ L T + + AP T T T+ 

Sbjct: 890 ENSPKEPGVPT-- TKTPAATKPEMTTTAKDKTTERDLRT-TPETTTAAPKMTKETATTTE 946 

Query: 744 SRGQPITDITTCLIPAHQAADLS — SNTHSQVLLTGSKVSN — HACQRLGGLSAPP-WAK 798 

+ TT + + D + T + KV+ ++ P AK 

Sbjct: 947 KTTESKITATTTQVTSTTTQDTTPFKITTLKTTTLAPKVTTTKKTITTTEIMNKPEETAK 1006 

Query: 799 PEDRQTQPQPHGHVPGKTTQGGPCPAA 825 

P+DR T + P K T+ P + 

Sbjct: 1007 PKDRATNSKATTPKPQKPTKAPKKPTS 1033 

Score - 205 (30.8 bits), Expect - 3.1e-12, Sum P(2) - 3.1e-12 
Identities = 146/565 (25%), Positives = 209/565 (36%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



281 TKRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAE — TPKAPFQICPGPMITKT 338 

TK+ + K AP TP +ATP+ PK TP+ P P + T 

597 TKKPAPTAPKEPAPTTPK ETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTT 652 

339 LLQT YPVVSVTLPQTYPASTMTTTPPKTS PV- PKVTI I KTPAQMYPGPTVTK-TAPHTCP 396 

+ p T P + TP + +P PK TP + P PT K TAP T P 

653 PEEPTPTTPEEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKE-- PAPTTPKETAP-TTP 709 

397 m pTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKT 453 

PT K + PT + P++ PT +S+KP GAT 

710 KGTAPTTLK-EPAPTTPKKPAPKELAPTT TKEPTSTTSD — KPAPTTPKGTAPT-T 761 

454 P PQMH P VT T P A KN P LQTC L S ATMS KT S S QRS P VG VT K P S PQT RL P AM I T KT P AQL RS V AT 513 

P + P TTP KPT T T + +P KP+P+ P TK P S 
762 PKEPAP-TTP-KEPAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTT-TKGPTSTTSDKP 818 

514 I LKT LC L A S PT VANVKAP PQ V A VAAGT PNTSGSIHENPP KAKAT VNV KQAAKWKA 569 

T +PT AP APT E PP + V+ K+ + K+ 

819 APTTPKETAPTTPKEPAPTTPKKPA— PTTP ETPPPTTSEVSTPTTTKEPTTIHKS 872 

570 SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKI KTGTQKQAKTDMAFKTSVAV 626 

S+P AE + L GVP + P + T T K T+ +T+ 
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Sbjct: 873 PDESTPELSAEPTPKALENSPKEPGVP--TTKTPAATKPEMTTTAKDKTTERDLRTTPET 930 

Query: 627 EMAGAPSMTK-VAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAA 685 

A AP TK A +K + +T Q+ + T ++ L LA 

Sbjct: 931 TTA-APKMTKETATTTEKT TESKITATTTQVTSTTTQDTTPFKITTLKTTTLAP 983 

Query: 686 PLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQTHLATGAVKVQS QAPLATCLT 740 

+T + + TE+ P +T K + AT K Q + P +T 

Sbjct: 984 KVT-TTKKTITTTEIMNKPE ETAKPKDRATNSKAT-TPKPQKPTKAPKKPTSTKKP 1037 

Query: 741 KTQSR-GQPITDIT TCLIPAHQAADLSSNTHSQVLLTGSKVSNHACQRLGGLSAPP 795 

KT R +P T T T+P + Q ++ N + S 

Sbjct: 1038 KTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQTTTRPNQTPNSKLVEVNPKSEDA 1097 

Query: 796 W-AKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTGHSTCN 845 

A+ E +PH +P T P QG+++ PM + CN 

Sbjct: 1098 GGAEGETPHMLLRPHVFMPEVTPDMDYLPRVPN-QGIIINPMLSDETNICN 1147 

Score - 198 (29.7 bits), Expect - 2.3e-ll, Sum P(2) = 2.3e-ll 
Identities = 142/513 (27%), Positives = 200/513 (38%) 

Query: 204 RPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPHQTVTIRFPCPVSLDAKCQPCLLT 263 

R + P +PP G + H V+ + + P L 

Sbjct: 207 RTKKKPTPKPPVVDEAGSGLDNGDFKVTTPDTSTTQHNKVSTSPKITTAKPINPRPSLPP 266 

Query: 264 R— TIRSTCLVHIEGDSVKTKRVSARTNKARAP- — ETPLSRRYDQAVTRPSR— AQTQ 315 

T + T L + +V+TK + TNK + E S + Q++ + S AT 
Sbjct: 267 NSDTSKETSLTVNKETTVETKETTT-TNKQTSTDGKEKTTSAKETQSIEKTSAKDLAPTS 325 

Query: 316 GPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTII 375 

+ TPKA GP +T T + P T P + PAST TP + +P + 
Sbjct: 326 KVLAKPTPKAE-TTTKGPALT-TPKEPTP TTPKE-PAST TPKEPTPTTIKSAP 375 

Query: 376 KTPAQMYPGPTVTKTAPHTC — PMPTMTKIQVHPTASRTGTPRQTC-PATITAKNRPQVS 432 

TP + P PT TK+AP T P PT TK + PT + P T PA T K+ P 
Sbjct: 376 TTPKE — PAPTTTKSAPTTPKEPAPTTTK-EPAPTTPKEPAPTTTKEPAPTTTKSAPTTP 432 

Query: 433 — LLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVT 489 

+ K P PA TP + P TTP KPT + T + +P 

Sbjct: 433 KEPAPTTPKKPAPTTPKEPAPT-TPKEPTP-TTP-KEPAPTTKEPAPT-TPKEPAPTAPK 488 

Query: 490 KPSPQT-RLPAMIT-KTPAQLRSVA TILK TLCLAS PTVANVKAPPQVAVAAGT 540 

KP+P- T + PA T K PA + T K T ++PT AP AT 

Sbjct: 489 KPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAPTT 548 

Query: 541 PNT-SGSIHENP PKAKATVNVKQAAKVV-KASSPSYLAEGKIRCLAQPHPGTGVPR 594 

P S + + P PK A K+ A K +P+ E +P P P+ 

Sbjct: 549 PKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPAPTTTKKPAPTA — PK 606 

Query: 595 AAAELPLEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTK-VAEEGDKPPHVYVPVDM 653 

A* p ++ T K+ K + AP+ + +A + P P + 

Sbjct: 607 EPA— PTTPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTPEEPTPTTPEEP 664 

Query: 654 AVTLPRGQLAAPLTNASSQRHP-PCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTC 712 

A T P+ AAP T +PP+PAPT PET T 

Sbjct: 665 APTTPKA— AAPNT PKEPAPTTPKEP — APTTPKEPAPTTPKETAPTTPKGTAPTT 716 

Query: 713 LSK 715 
L + 

Sbjct: 717 LKE 719 

Score = 108 (16.2 bits), Expect - 4.3e-02, Sum P(2) - 4.3e-02 
Identities = 60/214 (28%), Positives « 85/214 (39%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Score 



265 TIRSTCLVHIEGDSVKTKRVSAR-TNKA— RAPETP-LSRRYDQAVTRPSRAQTQGPVKA 320 

T + +H D T +SA T KA +P+ P + A T+P T 

862 TTKEPTTIHKSPDE-STPELSAEPTPKAuENSPKEPGVPTTKTPAATKPEMTTTAKDKTT 920 

321 ETP-- KAPFQICPGPMITK-TLLQTYPWSVTLPQTYPASTMTTTPPKTSPVPKVTIIKT 377 

E P P +TK T T + T T TTT T+P K+T +KT 

921 ERDLRTTPETTTAAPKMTKETATTTEKTTESKITATTTQVTSTTTQD-TTPF-KITTLKT 978 

378 PAQMYPGPTVTK TAPHTCPMPTMT-KIQVHPTASRTGTPRQTCPATITAKKRPQVSL 433 

+ P T TK T PTK+ TS+ TP+ P A +P + 

979 TT-LAPKVTTTKKTITTTEIMNKPEETAKPKDRATNSKATTPKPQKPTK — APKKPTSTK 1035 

4 34 LASIMKSL— PQVCPGPA-MAKTPPQMHPVTTPAKNPLQT 470 
M + P+ P P M T P+++P + A+ LOT 
1036 KPKTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQT 1075 

= 56 (8.4 bits), Expect » 3.1e-12, Sum P(2) = 3.1e-12 
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Identities - 17/60 (28%), Positives - 22/60 (36%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P PS E AP P+ + K+ P P E + + P 

Sbjct: 533 TTKEPAPTTTKSAPTTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEP 592 

Score = 52 (7.8 bits), Expect » 9.6e-16, Sum P(2) - 9.6e-16 
Identities - 17/59 (28%), Positives « 22/59 (37%) 

Query: 22 TVHEPV-VTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAE-GKTASRR 78 

T EP T P P P+ E P P+ +KE P P E TA ++ 

Sbjct: 4 31 TPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKEPAPTTKEPAPTTPKEPAPTAPKK 489 

Score = 51 (7.7 bits), Expect - 1.2e-15, Sum P(2) * 1.2e-15 
Identities * 15/51 (29%), Positives = 19/51 (37%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAE 71 

T EP T P P P + + AP P+ + KE P P E 

Sbjct: 416 TTKEPAPTTTKSAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKE 466 

Score - 47 (7.1 bits), Expect - 3.2e-15, Sum P(2) - 3.2e-15 
Identities = 12/41 (29%), Positives ~ 17/41 (41%) 

Query: 36 PAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

P P P + P +P +KS P++PA T S 

Sbjct: 350 PTPTTPK--EPASTTPKEPTPTTIKSAPTTPKEPAPTTTKS 388 

Score =• 47 (7.1 bits), Expect - 3.2e-15, Sum P(2) =• 3.2e-15 
Identities » 15/57 (26%), Positives » 19/57 (33%) 

Query: 22 TVHEPWTQWAVHPPAPAHFSLLDKMEKAPPQPQHEG-LKSKEHLPQQPAEGKTASR 77 

T EP T P P P+ E AP P+ +KE P T + 

Sbjct: 377 TPKEPAPTTTKSAPTTPKEPAPTTTKEPAPTTPKEPAPTTTKEPAPTTTKSAPTTPK 433 

Score = 46 (6.9 bits), Expect - 4.0e-15, Sum P(2) » 4.0e-15 
Identities = 16/58 (27%), Positives = 22/58 (37%) 

Query: 20 LATVHEPVVT QWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKT 74 

L T EP T + A P P+ + P +P KS P++PA T 

Sbjct: 34 4 LTTPKEPTPTTPKEPASTTPKEPTPTTIKSAPTTPKEPAPTTTKSAPTTPKEPAPTTT 401 

Score « 42 (6.3 bits), Expect - l.Oe-14, Sum P(2) - 1.0e-14 
Identities - 15/60 (25%), Positives - 21/60 (35%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P P+ + AP P+ + KE P E + + P 

Sbjct: 4 63 TPKEPAPTTKEPAPTTPKEPAPTAPKKPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEP 522 

Score = 39 (5.9 bits), Expect = 2.1e-14, Sum P(2) = 2.1e-14 
Identities = 15/55 (27%), Positives = 20/55 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

T EP T P PA + + P +P KS ++PA T S 

Sbjct: 494 TPKEPAPTT PKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKS 544 



Pedant information for DKFZphtes3_4ol9, frame 2 



Report for DKFZphtes3_4ol9 .2 



(LENGTH] 

[MW] 

[pl] 

(HOMOL) 

[ FUNCAT] 

(FUNCAT) 

[FUNCAT] 

[FUNCAT] 

(BLOCKS) 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITE] 

IKWJ 

[KW] 



1180 

127693.40 
10.25 

SWISSPROT:MUC2 HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). le-08 



98 classification not yet clear-cut [S. 
30.01 organization of cell wall (S. 
30.90 extracellular/secretion proteins 
01.05.01 carbohydrate utilization (S. 
BL00412B Neuromodulin (GAP-43) proteins 
CYTOCHROME_C 1 
MYRISTYL 12 
CAMP_PHOSPHO SITE 1 
CK2_PHOSPH0_SITE 8 
PKC_PHOSPHO_SITE 25 
ASN_G L YCOS YLAT I ON 2 
Alpha_Beta 

LOW COMPLEXITY 5.00 % 



cerevisiae, YJR151C] 6e-06 
cerevisiae, YIR019c] 6e-06 

[S. cerevisiae, YIR019c] 6e-06 
cerevisiae, YIR019c] 6e-06 
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SEQ MTLQGRADLSGNQGNAAGRLATVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLK 

SEG 

PRD cccccceeeccccccceeeeeeeeceeeeeeeecccccccceeeeccccccccccccccc 

SEQ SKEHLPQQPAEGKTASRRVPRLPAVVESQAFKNILVDEMDMMHARAATLIQANWRGYWLR 

SEG 

PRD cccccccccccccccccchhhhhhhhhhhhhhheeehhhhhhhhhhhhhhhhhccchhhh 

SEQ QKLISQMMAAKAIQEAWRRFNKRHILHSSKSLVKKTRAEEGDIPYHAPQQVRFQHPEENR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeecccchhhhhhhhcccccccccceeeecccccce 

SEQ LLSPPIMVl^KETQFPSCDNLVLCRPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPH 

SEG < 

PRD eeccceeeecccccccccceeeecccccccccccccccccccccccccceeeeeeeeccc 

SEQ QTVTIRFPCPVSLDAKCQPCLLTRTIRSTCLVHIEGDSVKTKRVSARTNKARAPETPLSR 

SEG 

PRD eeeeeecccccccccccccccccccccceeeeecccccccceeeeecccccccccccccc 

SEQ RYDQAVTRpisRAQTQGPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMT 

SEG xxxx 

PRD ccceeeeeccccccccceeecccccccccccccccccccccccccccccccccccccccc 

SEQ TTPPKTSPVPKVTIIKTPAQMYPGPTVTKTAPHTCPMPTMTKIQVHPTASRTGTPRQTCP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccceeeccccccccccccccccccccccccccceeeccccccccccccccc 

SEQ ATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQRSPVGVTKPSPQTRLPAMITKTPAQLRSVATILKTLCLASPTVANVKAPPQVAVAAGT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PNTSGSIHENPPEIAKATVNVKQAAKVVKASSPSYLAEGKIRCLAQPHPGTGVPRAAAELP 

SEG xxxxxxxxxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ L EAE K I KTGT QKQ AKT DMA FKT S V A V EMAG A PS W T KV AE EGDKPPHVYV P V DMA VT L P RG 

SEG xxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeccccccceeeccccccccccc 

SEQ QLAAPLTNASSQRHPPCLSQRPLAAPLTECASSQGHLPTELTKTPSLAHLDTCLSKMHSQT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLATGAVKVQSQAPLATCLTKTQSRGQPITDITTCLIPAHQAADLSSNTHSQVLLTGSKV 

SEG 

PRD ccccceeeeeccccceeeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ SNHACQRLGGLSAPPWAKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTG 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSTCNVESWGDNGATRAQPSMPGQAVPCQEDTGPADAGVVGGQSWNRAWEPARGAASWDT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ WRNKAWPFRRSGEPMVSMQAAEEIRILAVITIQAGVRGYLARRRIRLWHRGAMVIQATW 

SEG 

PRD ccceeecccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ RGYRVRRNLAHLCRATTTIQSAWRGYSTRRDQARHWQMLHPVTWVELGSRAGVMSDRSWF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccchhhhhhhhhhh 

SEQ QDGRARTVSDHRCFQSCQAHACSVCHSLSSRIGSPPSVVMLVGSSPRTCHTCGRTQPTRV 

SEG 

PRD hccceeeeccceeeecccceeeeeeeecccccccccceeeeeecccccccccccccccee 

SEQ VQGMGQGT5GPGAVSWASAYQLAALSPRQPHRQDKAATAIQSAWRGFKIRQQMRQQQMAA 

SEG xxxxxxxxxxxxx 

PRD eeeccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KIVQATWRGHHTRSCLKNTEALLGPADPSASSRHMHWPGI 

SEG xx 

PRD hhhhhhhccccccchhhhhhhhcccccccccccccccccc 
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Prosite for DKFZphtes3_4ol9. 2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
P500005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSO0005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PSO0008 
PS00008 
PS00008 
PS00008 
PS00008 
PSOO008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00190 



542->546 
668->672 
282->286 
76->79 
148->151 
244->247 
265->268 
278->281 
281~>284 
285->288 
288->291 
299->302 
322->325 
414->417 
424->427 
481->484 
610->613 
671->674 
679->682 
900->903 
959->962 

987- >990 
1015->1018 
1049->1052 
1065->1068 
1106->1109 
1146->1149 
1171->1174 

22->26 
42->46 
156->160 
546->550 
843->852 

988- >992 
1003->1007 
1Q27->1031 

11->17 
14->20 
539->545 
591->597 
746->752 
777->783 
8S3->859 
878->884 
882->888 
1008->1014 
1053->1059 
1083->1089 
1042->1048 



A S N_GL Y C OS YL AT I ON 

ASN_GLYCOSYLATION 

CAMP PHOSPHORS I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

pkcTphospho'site 
pkc_phospho~site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho site 
pkc_phospho~site 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_ PHOSPHORS I TE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO~SITE 

PKC PHOSPHORS I TE 

CK2~PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

CYTOCHROME C 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00169 



(No Pfam data available for DKFZphtes3_4ol9 . 2 ) 
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DKFZphtes3_50j4 



group: testes derived 

DKFZphtes3_50j4 encodes a novel 187 amino acid protein proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown, prolin ritch protein 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1186 bp 

Poly A stretch at pos. 1176, polyadenylation signal at pos. 1126 



1 CACTGGGCGT CTGAAGCTCA GAGCTCACCC CTGAGATGGG CTCTCCTAGG 
51 CCTCCTGGGA TGAGGGAGCC ACCAGGACCC AGTGCTGTGA TGCCTGCTCT 
101 TCCCTCTACC AGCACCTGCC CGCCCAGAGA CCAGGGCACC CCTGAAGTCC 
151 AGCCCACCCC TGCAAAGGAC ACATGGAAGG GCAAGCGGCC TCGATCCCAG 
201 CAGGAGAACC CAGAGAGCCA GCCTCAGAAG AGGCCACGCC CCTCAGCCAA 
2 51 GCCCTCCGTC GTAGCTGAGG TCAAGGGCAG CGTCTCGGCC AGCGAACAGG 
301 GCACCTTGAA TCCCACGGCT CAAGACCCCT TCCAGCTCTC CGCTCCTGGC 
351 GTCTCCTTGA AGGAGGCTGC AAATGTTGTG GTCAAGTGCC TCACCCCTTT 
401 CTACAAGGAG GGCAAGTTTG CTTCCAAGGA GTTGTTTAAA GGCTTTGCCC 
451 GCCACCTCTC ACACTTGCTG ACTCAGAAGA CCTCTCCTGG AAGGAGCGTG 
501 AAAGAAGAGG CCCAGAACCT CATCAGGCAC TTCTTCCATG GCCGGGCCCG 
551 GTGCGAGAGC GAAGCTGACT GGCATGGCCT GTGTGGCCCC CAGAGATGAC 
601 CAACTGCTGG CTGGGCAGGG CCCGCGTCCT CCCCCAGATT CTAGCATGGG 
651 TCATCCTGGG CCTCACCTGC TGATGCCAGG GCCATCGTCT TTTCTCAGTC 
701 CTTCTCCTTT CCAACCATAC TTGGCTTTGG GGATGACCCC AGACACCCCC 
7 51 TGAATCCAGG TCAGAGGTCA GCCCACCTTT CTTTCTGCTT GCAAAGCCTA 
801 TAGACCCTTC TCAGAGCGGT CCTCATGGCT GGGTTTTCTG GGACACATGT 
851 CGAGGACAGA AGGTGGAGGG TGGTGGAGCT GCTGCTGGAA GAAGGGGAAG 
901 GAAGAGTGGC CCCTCCCCGA GTTCTAAGTC AGGATGAGGC CCACCTGTCC 
951 AAGGTATCGG AACCTACCCA GGGGACCCTC AGATCCTCCA CCCACTCCCC 
1001 CATCCATTAC GATGCCAGCT TCCAGCCTTG CCCAGGTCAG AGCTGTGGCA 
1051 GAGGAGAGGC AGCCAGGCCC TGTTCCTGCT CAGCTCCTGC TCAGGAAGGC 
1101 CAGGCCTGAC AGATGTTTGG GAGAGGAATA AAGTTGTGTT GTTGTGGGGC 
1151 ATGCAGGCGT GCACACAGCC CTTTTCAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 36 bp to 596 bp; peptide length: 187 
Category: putative protein 



1 MGSPRPPGMR EPPGPSAVMP ALPSTSTCPP RDQGTPEVQP TPAKDTWKGK 

51 RPRSQQENPE SQPQKRPRPS AKPSWAEVK GSVSASEQGT LNPTAQDPFQ 

101 LSAPGVSLKE AANVWKCLT P FY KEG K FAS KELFKGFARH LSHLLTQKTS 

151 PGRSVKEEAQ NLIRHFFHGR ARCESEADWH GLCGPQR 

BLAST P hits 

Entry MMU92455_1 from database TREMBL: 
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product: W WW domain binding protein 7"; Mus rausculus WW domain binding 
protein 7 mRNA, partial cds. 

Score - 134, P = 6.96-08, identities = 45/125, positives = 56/125 



Alert BLASTP hits for DKFZphtes3_50j4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_50j4, frame 3 

Report for DKFZphtes3_50 j4 . 3 

(LENGTH] 187 

[MW] 20353.06 

[pi] 9.76 

[PROSITE] MYRISTYL 1 

[PROSITE] AMI DAT I ON 1 

( PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] PKC_PHOSPHO~SITE 6 

[KW] All^Alpha 

[KW] LOW_COMPLEXITY 8.56 % 

SEQ MGSPRPPGMREPPGPSAVMPALPSTSTCPPRDQGTPEVQPTPAKDTWKGKRPRSQQENPE 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQPQKRPRPSAKPSVVAEVKGSVSASEQGTLNPTAQDPFQLSAPGVSLKEAANVVVKCLT 

SEG 

PRD cccccccccccccchhhhhccccccccccccccccccccccccccccchhhhhhheeecc 

SEQ P FY KEG K F AS KEL FKG F ARH LS H LLTQKT S PG RS VK E EAQNL I RH FFHG RARC ES E AD WH 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhheeecccccchhhhhhhhhhhhhhccchhhhhhhhh 

SEQ GLCGPQR 

SEG 

PRD ccccccc 



Prosite for DKFZphtes3_50 j4 . 3 



PS00005 


3->6 


PS00005 


46->49 


PS00005 


70->73 


PS00005 


107->110 


PS00005 


146->149 


PS00005 


154->157 


PS00006 


54->58 


PS00006 


84->88 


PS00006 


94->98 


PSOO006 


107->111 


PS00006 


154->158 


PS00006 


175->179 


PS00008 


81->87 


PS00009 


48->52 



PKC_PHOSPHO 

PKC_PHOSPHO" 

PKC PHOSPHO] 

PKC~PHOSPHO* 

PKC_PHOSPHO" 

PKC_PHOSPHO* 

CK2_PHOSPHO* 

CK2_PHOSPHO 

CK2_PHOSPHO 

CK2_PHOSPHO" 

CK2_PHOSPHO* 

CK2_PHOSPHO* 

MYRISTYL 

AMI DAT ION 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
"SITE 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00009 



{No Pfam data available for DKFZphtes3_50 j4 . 3) 
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DKFZphtes3_50n06 



group: testes derived 

DKF2phtes3 50n06 encodes a novel 186 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1095 bp 

Poly A stretch at pos. 1085, polyadenylation signal at pos. 1061 



1 CAAGACCCTC GGAGCCAAGA AACAACACTG AGTTCCAGAT TTCGGAAGGT 
51 TCACGAGTGT TGCCGACACG CCCTCCCAAC TGCAGACATC CTCCCTGGAG 
101 GACCTGCTGT GCTCACATGC CCCCCTGTCC AGCGAGGACG ACACCTCCCC 
151 GGGCTGTGCA GCCCCCTCCC AGGCACCCTT CAAGGCCTTC CTCAGTCCCC 
201 CAGAGCCACA TAGCCACCGA GGCACCGACA GGAAGCTGTC CCCGCTCCTG 
251 AGCCCCTTGC AAGACTCACT GGTGGACAAG ACCCTGCTGG AGCCCAGGGA 
301 GATGGTCCGG CCTAAGAAGG TGTGTTTCTC GGAGAGCAGC CTGCCCACCG 
351 GGGACAGGAC CAGGAGGAGC TACTACCTCA ATGAGATCCA GAGCTTCGCG 
401 GGCGCCGAGA AGGACGCGCG CGTGGTGGGC GAGATCGCCT TCCAGCTGGA 
4 51 CCGCCGCATC CTGGCCTACG TGTTCCCGGG CGTGACGCGG CTCTACGGCT 
501 TCACGGTGGC CAACATCCCC GAGAAGATCG AGCAGACCTC CACCAAGTCT 
551 CTGGACGGCT CCGTGGACGA GAGGAAGCTG CGCGAGCTGA CGCAGCGCTA 
601 CCTGGCCCTG AGCGCGCGCC TGGAGAAGCT GGGCTACAGC CGCGACGTGC 
651 ACCCGGCGTT CAGCGAGTTC CTCATCAACA CCTACGGAAT CCTGAAGCAG 
701 CGGCCCGACC TGCGCGCCAA CCCCCTGCAC AGCAGCCCGG CCGCGCTGCG 
751 CAAGCTGGTC ATCGACGTGG TGCCCCCCAA GTTCCTGGGC GACTCGCTGC 
801 TGCTGCTCAA CTGCCTGTGC GAGCTCTCCA AGGAGGACGG CAAGCCCCTC 
851 TTCGCCTGGT GAGCCGCCCC GCGCCCGCCG CCTTGCCTGC AGTAAACGCG 
901 TTTGTTCCAA CCCGGGGCCG CGGTGCCTCC TGCGCGTCCC CCCGGAGGGG 
951 AAAGGGCCGC GTCCCCCGCG CGCGAGGCCA GAGAAGGCCC CGCTCCCACC 
1001 GGTGCTGGGC CCCGACCGCA GCCCGCCGCT GCCCGCACCT GCGGAGTGCT 
1051 TCTCACCCCT CATTAAAATC ATCCGTTTGC TTGTCAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 302 bp to 859 bp; peptide length: 186 
Category: putative protein 
Classification: no clue 



1 MVRPKKVCFS ESSLPTGDRT RRSYYLNEIQ SFAGAEKDAR WGEIAFQLD 
51 RRILAYVFPG VTRLYGFTVA NIPEKIEQTS TKSLDGSVDE RKLRELTQRY 
101 LALSARLEKL GYSRDVHPAF SEFLINTYGI LKQRPDLRAN PLHSSPAALR 
151 KLVIDWPPK FLGDSLLLLN CLCELSKEDG KPLFAW 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKF2phtes3_50n06, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_50n06, frame 2 

Report for DKFZphtes3_50n06 .2 

[LENGTH] 186 

[MW] 21049.39 

ipl] 9.28 

[KW] All Alpha 

[KW] LOWJIOMPLEXITY 5.38 % 

SEQ MVRPKKVCFSESSLPTGDRTRRSYYLNEIQSFAGAEKDARVVGEIAFQLDRRILAYVFPG 

SEG 

PRD ccccceeeccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ VTRLYGFTVANIPEKIEQTSTKSLDGSVDERKLRELTQRYLALSARLEKLGYSRDVHPAF 

SEG 

PRD ceeeeeeeeeeccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccch 

SEQ SEFLINTYGILKQRPDLRANPLHSSPAALRKLVIDWPPKFLGDSLLLLNCLCELSKEDG 

SEQ xxxxxxxxxx 

PRD hhhhhhcceeecccccccccccccchhhhhhhhhhccccccccchhhhhhhhhhhhcccc 

SEQ KPLFAW 

SEG 

PRD CCCCCC 



(No Prosite data available for DKF2phtes3_50n06. 2) 
(No Pfam data available for DKFZphtes3_50n06.2) 
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group: testes derived 

DKFZphtes3_50n23 encodes a novel 499 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 
2 EST hits 

(from other testis librarys) testis specific cDNA? 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1907 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pos. 1872 



1 GGGCACCAGC CACTTTCCAC CATGACTGTG CGCTCGAGGG TCGCAGATGT 
51 GTTCGGCAGC AAGGACACTG AGAGCCTTGA GCCTGTGCTT TTACCCTTAG 
101 TAGATCGCAG GTTTCCTAAG AAATGGGAAA GACCGGTGGC AGAAAGCTTA 
151 GGCCACAAAG ACAAAGACCA GGAGGACTAC TTCCAGAAGG GAGGACTCCA 
201 AATTAAGTTC CACTGTAGCA AGCAGCTGTC TCTAGAGAGC TCCAGGCAGG 
251 TGACCTCTGA GAGCCAAGAG GAGCCCTGGG AGGAGGAATT CGGCCGGGAG 
301 ATGCGGAGGC AGCTGTGGCT GGAGGAGGAG GAGATGTGGC AGCAGCGGCA 
351 GAAGAAGTGG GCCCTGCTGG AGCAGGAGCA TCAGGAGAAG CTGCGGCAGT 
401 GGAATCTGGA AGACCTGGCC AGGGAGCAAC AGCGGAGATG GGTCCAGCTA 
451 GAAAAGGAGC AGGAGAGCCC ACGGAGAGAG CCAGAGCAGC TAGGGGAGGA 
501 TGTGGAGAGG AGGATCTTCA CACCCACCAG TCGATGGAGG GACTTGGAGA 
551 AGGCAGAGCT ATCATTAGTG CCTGCCCCAA GCCGGACCCA ATCTGCTCAC 
601 CAAAGCAGGA GGCCACACTT GCCCATGTCT CCTAGTACCC AGCAGCCTGC 
651 CCTGGGAAAG CAGAGACCTA TGAGTTCAGT GGAGTTTACC TACAGACCAC 
701 GGACCCGCCG AGTTCCCACA AAGCCCAAGA AATCTGCCTC CTTTCCTGTC 
751 ACTGGGACAT CCATCCGAAG GCTGACCTGG CCCTCTTTGC AGATATCCCC 
801 TGCAAATATT AAGAAGAAGG TGTACCACAT GGACATGGAG GCCCAGAGGA 
851 AGAACCTGCA GCTCCTGAGT GAGGAGTCTG AGTTGAGGCT GCCCCACTAC 
901 CTGCGCAGCA AAGCACTGGA GCTCACCACC ACCACCATGG AGCTGGGCGC 
951 GCTCAGGCTG CAGTACCTGT GCCATAAGTA CATCTTCTAT AGACGCCTCC 
1001 AGAGCCTCCG GCAAGAAGCG ATCAACCATG TACAAATCAT GAAAGAAACG 
1051 GAGGCTTCCT ACAAGGCCCA GAACCTCTAC ATCTTCCTGG AAAACATTGA 
1101 CCGCCTGCAG AGTCTCAGGC TGCAGGCCTG GACGGACAAG CAGAAGGGGC 
1151 TGGAGGAGAA GCACCGAGAG TGCCTGAGCA GCATGGTGAC CATGTTCCCC 
1201 AAGCTCCAGC TGGAGTGGAA CGTTCACCTG AACATCCCTG AGGTCACCTC 
1251 GCCAAAGCCA AAGAAATGCA AGTTGCCTGC AGCCTCACCC CGGCACATCC 
1301 GCCCCAGTGG CCCCACCTAC AAGCAGCCCT TTCTGTCTAG GCACCGGGCA 
1351 TGTGTGCCCC TGCAGATGGC CCGCCAACAG GGGAAGCAGA TGGAGGCTGT 
1401 CTGGAAGACC GAGGTGGCCT CCTCCAGTTA CGCAATAGAA AAAAAGACCC 
1451 CTGCCAGCCT TCCCCGGGAC CAGCTGAGGG GACACCCAGA TATTCCCCGG 
1501 CTGTTGACAC TGGACGTGTA GTCCTCCTGC CACAAAAGCC TGAACTTCCT 
1551 GAAGGCCCAG TAAGCGCCTC AGCGAACCAA AGGAAGGAAT GCCAGGAACC 
1601 TACAAATGAA TCCGCTTAGC TTGTTCAAAA AAAGTCAAGC GAGTCACTCC 
1651 CTGGAACCCA AATAAGCCAG AAGGATCAAG ACAGCCCCAG TCTCCACTGC 
1701 ATCCCTCAGC CAGTGATTCT CAACCTTCTG AGGGACGGAA ACCCACAGAG 
1751 AACTTGGTCA AAATGCAGGT TCCCAGCTGG TGCTTTTAAA GAAACCCTCT 
1801 GGGGGTTGCT GAGTACTCCT AGAACTTTGA GAAACACTGC TTCCCTCCTG 
1851 CAGTCCCCAA ACTCTACATT TTAATAAAAT AGAGGTTGGT TTATTTTAAA 
1901 AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 22 bp to 1518 bp; peptide length; 499 
Category: similarity to known protein 
Classification: no clue 



1 MTVRSRVADV FGSKDTESLE PVLLPLVDRR FPKKWERPVA ESLGHKDKDQ 

51 EDYFQKGGLQ IKFHCSKQLS LESSRQVTSE SQEEPWEEEF GREMRRQLWL 

101 EEEEMWQQRQ KKWALLEQEH QEKLRQWNLE DLAREQQRRW VQLEKEQESP 

151 RREPEQLGED VERRIFTPTS RWRDLEKAEL SLVPAPSRTQ SAHQSRRPHL 

201 PMSPSTQQPA LGKQRPMSSV EFTYRPRTRR VPTKPKKSAS FPVTGTSIRR 

251 LTWPSLQISP ANIKKKVYHM DMEAQRKNLQ LLSEESELRL PHYLRSKALE 

301 LTTTTMELGA LRLQYLCHKY I FYRRLQSLR QEAINHVQIM KETEASYKAQ 

351 NLYIFLENID RLQSLRLQAW TDKQKGLEEK HRECLSSMVT MFPKLQLEWN 

4 01 VHLNIPEVTS PKPKKCKLPA ASPRHIRPSG PTYKQPFLSR HRACVPLQMA 

451 RQQGKQMEAV WKTEVASSSY AIEKKTPASL PRDQLRGHPD IPRLLTLDV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_50n23, frame 1 

PIR:S28589 trichohyalin - rabbit, H - 1, Score - 134, P - 5.3e-05 

TREMBLNEW : AF1 32 4 7 9_1 product: w Ese2L protein"; Mus musculus Ese2L 

protein mRNA, complete cds . , N » 1, Score « 130, P = 0.00017 



>PIR:S28589 trichohyalin - rabbit 
Length - 1,407 

HSPs: 



Score -134 (20.1 bits), Expect = 5.3e-05, P - 5.3e-05 
Identities = 88/354 (24%), Positives = 154/354 (43%) 



Query: 


29 


RRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIK-FHCSKQLSLESSRQVTSESQEEPWE 


87 




R++ k +R + L + ++E ++ G + F +QL +++ E +EE + 




Sbjct: 


165 


RQYRDKEQRLQRQELEERRAEEEQLRRRKGRDAEEFIEEEQLRRREQQELKRELREEEQQ 


224 


Query: 


88 


EEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQ 


147 




RE + L+EEE RQ++W E Q++LR+ LE++ RE+++R Q E+ + 




Sbjct: 


225 


RRERREQHERA-LQEEEEQLLRQRRWRE-EPREQQQLRR-ELEEI-REREQRLEQEERRE 


280 


Query: 


148 


ESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQ 


207 




+ RRE ++L E ERR ++ + EL RQQR + + 




Sbjct: 


281 


QQLRRE-QRL-EQEERREQQLRRELEEIREREQRLEQEERREQRLEQEERREQQLKRELE 


338 


Query: 


208 


QPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKK-K 


266 




+ +QR +E RR+++++A GS+RW SA++K 




Sbjct: 


339 


EIREREQR LEQEER-REQLLAEEVREQAR— ERGESLTR-RWQRQLESEAGARQSK 


390 


Query: 


267 


VYHMDMEAQRKNLQLLSEESELRLPHYLRSKALELTTTTM ELGALRLQYLCHKY 


320 




VY +R+ QL++ER R+LE E RQL+ 




Sbjct: 


391 


VYS RPRRQEEQSLRQDQERR-QRQERERELEEQARRQQQWQAEEESERRRQRLSARP 


446 


Query: 


321 


IFYRRLQSLRQEAINHVQIMKETEASYKAQNLYI-FLENIDRLQSL-RLQAWTDKQKGLE 


378 




R Q +E Q+EE ++ + FLE ++LQ RQ ++ E 




Sbjct: 


447 


SLRER-QLRAEERQEQEQRFREEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQE 


505 


Query: 


379 


EKHR 382 








++ R 




Sbjct: 


506 


DRER 509 




Score 


- 119 


(17.9 bits), Expect « 2.2e-03, P = 2.2e-03 





Identities » 79/357 (22%), Positives - 150/357 (42%) 

Query: 33 KKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFGR 92 

++ E+ + + K +++E Q+ + + +Q R+ + + + EE+F + 

Sbjct: 990 RREEQELRQERDRKFREEEQLLQE REEERLRRQERDRKFREEERQLRRQELEEQFRQ 1046 

Query: 93 EMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRR 152 

E R+ LEE+ + Q++++K L QE K R+ E+ R +Q R QL +E++ R 
Sbjct: 1047 ERDRKFRLEEQ- 1 RQEKEEK-QLRRQERDRKFRE EEQQRRRQEREQQLRRERDRKFR 1101 

Query: 153 EPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSR--RPHLPMSPSTQQPA 210 
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Sbjct : 


1102 




211 


chirf ■ 

ODJU t • 


1101 


Query : 


&oo 


Sbjct: 


1221 


Query: 


317 


Sbjct: 


1281 


Query: 


376 


Sbjct: 


1339 


Score 


- 109 



E EQL ++ E 



L +Q R + 



R R L + E L+ 



+ R 



+ R R 



+R+ 



E Q 



-RKNLQLLS-EESELRLPHYLRSKALELTTTTMELGALRLQYL 316 
R+ QLL EE ELR + + E E LR Q 



+ L E 



++ +E + Y+A+ + 



RL+ LR + 



Identities 



E K RE 
SRERKFRE 1346 

(16.4 bits), Expect - 1.9e-01, P =» 1.7e-01 
37/113 (32%), Positives - 60/113 (53%) 



Query: 67 KQLSLESSRQVTSESQ — EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKL 124 

+QL E R+ E Q +E EE R+ R + EEE++ Q+R+++ L QE + KL 
Sbjct: 764 QQLRRERDRKFREEEQLLQEREEERLRRQERERKLREEEQLLQEREEE-RLRRQERERKL 822 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

R+ EL +E++ ++ +E+E RE EQL E+ + R R L + E 

Sbjct: 823 REE--EQLLQEREEERLR-RQERERKLREEEQLLRQEEQEL — RQERARKLREEE 872 

Score - 107 (16.1 bits), Expect = 3.0e-01, P * 2.6e-01 
Identities = 35/109 (32%), Positives - 61/109 (55%) 

Query: 71 LESSRQVTSESQEEPWE-EEFGREMRRQL WLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

L Q+ ES+EE +E +++RR+ + EEE++ Q+R+++ L QE + KLR+ 
Sbjct: 742 LREEEQLLQESEEERLRRQEREQQLRRERDRKFREEEQLLQEREEE - RLRRQERERKLRE 800 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

EL +E++ ++ +E+E RE EQL ++ E R R L + E 

Sbjct: 801 E — EQLLQEREEERLR-RQERERKLREEEQLLQEREEERLRRQERERKLREEE 850 

Score = 104 (15.6 bits), Expect « 9.4e-02, P - 9.0e-02 
Identities = 84/339 (24%), Positives = 149/339 (43%) 



67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE — HQEK 123 
+QL E ++ +EE EE RE R++L +LEEEE Q+R++ L E++ +++ 

451 RQLRAEERQEQEQRFREE— EEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDR 507 

124 LRQWNLEDLAREQQRRWVQLEKEQESPRR EP EQLGEDVE-RRIFTPTSRWRDL 175 

R+ ++ Q RW QL++E + R +P EQL E+ E +R R R+ 

508 ERRRRQQEQRPGQTWRW-QLQEEAQRRRHTLYAKPGQQEQLREEEELQREKRRQEREREY 566 

17 6 EKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRT RRV 231 

+ EL + + R+ + Q+L + R+ E + R RR 

567 REEE-KLQREEDEKRRRQERERQYRELEELRQEEQL-RDRKLREEEQLLQEREEERLRRQ 624 

232 PTKPK KSAS FPVTGTS IRRLTWPSLQI S PAN I KKKV YHMDMEAQRK NLQLLSEE 285 

+ K + +R+ L+ ++++ + E +RK QLL E 

625 ERERKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQER 684 

286 SELRL PHY LRS KALE LTTTTMELGALRLQYLCHKYI FYRRL-QSLRQEAINHV- - 337 

E RL R++ L L ELR + L+ RR Q LRQE + 

685 EEERLRRQERARKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQLLRQERDRKLRE 744 

338 — QIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRECL 385 

Q+++E+E + E +L+ R + + ++++ L+E+ E L 

74 5 EEQLLQESEEERLRRQ EREQQLRRERDRK FREEEQLLQE REEERL 789 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score ° 103 (15.5 bits), Expect - 1.2e-01, P - l.le-01 
Identities - 42/152 (27%), Positives = 74/152 (48%) 

Query: 36 ERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFG-REM 94 

ER + K +++E ++ +++ +++L E + + E QE E + RE 
Sbjct: 835 ERLRRQERERKLREEEQLLRQEEQELRQERARKLR-EEEQLLRQEEQELRQERDRKLREE 893 

Query: 95 RRQLWLEEEEMWQQRQKKWA LLEQEHQEKLRQWNLEDLAREQQ RRWVQ-LEKE 146 

+ L EE+E+ Q+R +K LL++ +E+LR+ E RE++ RR Q L +E 

Sbjct: 894 EQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKLREEEQLLRREEQELRRE 953 

Query: 147 QESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 
+ RE EQL ++ E R R L + E 
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Sbjct: 954 RARKLREEEQLLQEREEERLRRQERARKLREEE 986 

Score - 103 (15.5 bits), Expect - 7.8e-01, P » 5.4e-01 
Identities - 31/91 (34%), Positives - 52/91 (57%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

++L E R++ E Q EE+ R+ R + EEE++ Q+R+++ L QE KLR+ 
Sbjct: 642 QELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQEREEE-RLRRQERARKLRE 700 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQL 157 

E L R++++ +L +E+E RE EQL 
Sbjct: 701 E — EQLLRQEEQ ELRQERERKLREEEQL 726 

Score = 101 (15.2 bits), Expect = 2.0e-0l, P » 1.8e-01 
Identities = 38/111 (34%), Positives = 57/111 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLE 130 

E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ + 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEE-Q 987 

Query: 131 DLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

L RE+Q +L +E++ RE EQL ■*■+ E R R + E L 

Sbjct: 988 LLRREEQ ELRQERDRKFREEEQLLQEREEERLRRQERDRKFREEERQL 1035 

Score = 101 (15.2 bits), Expect = 1.3e+00, P = 7.2e-01 
Identities - 33/108 (30%), Positives = 56/108 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R++ E Q EE+ R+ R + EEE++ +Q +++ L QE KLR+ E 
Sbjct: 841 ERERKLREEEQLLRQEEQELRQERARKLREEEQLLRQEEQE LRQERDRKLREE — EQ 895 

Query: 132 LA REQQ RRWVQL EKEQES PRREPEQLGEDVERRI FT PTSRWRDLEKAE 179 

L R++++ +L +E++ RE EQL + + E R R L + E 

Sbjct: 896 LLRQEEQ ELRQERDRKLREEEQLLQES EEERLRRQERERKLREEE 940 

Score - 99 (14.9 bits), Expect « 2.0e+00, P « 8.7e-01 
Identities = 32/97 (32%), Positives = 50/97 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R+ E Q EE E R L SEE Q +++ L QE + KLR+ E 
Sbjct:. 578 EKRRRQERERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREE--EQ 635 

Query: 132 LAREQ Q RRWVQL EKEQES PRREPEQLGEDVERRI 165 

L R++ Q R +L +E++ RRE ++L ++ ER++ 

Sbjct: 636 LLRQEEQELRQERERKLREEEQLLRREEQELRQERERKL 674 

Score = 99 (14.9 bits), Expect = 2.0e+00, P 8.7e-01 
Identities = 34/111 (30%), Positives = 58/111 (52%) 

Query: 67 KQLS L E S S RQV TSESQ--EEPWEEEFG REMRRQ LWL E EEEMWQQRQKKW ALLEQE H QE K L 124 

++L E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE + KL 
Sbjct: 664 QELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQE LRQERERKL 720 

Query: 125 RQWNLE DLAREQQRRWVQLEKEQES PRREPEQLGEDVERRI FT PTSRWRDLEK 177 

R+ + L RE+Q L +E++ RE EQL ++ E R + L + 

Sbjct: 721 REEE-QLLRREEQL LRQERDRKLREEEQLLQESEEERLRRQEREQQLRR 768 

Score = 98 (14.7 bits), Expect - 2.6e+00, P = 9.2e-01 
Identities - 37/146 (25%), Positives - 77/146 (52%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 79 

E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 

Sbjct: 655 EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 
Sbjct: 715 ERERKLREEE — QLLRREEQLLRQERDRKLRE EEQLLQES EE ERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQES PRRE PEQLG- EDVERRI 165 

++ E+EQ RE E+L ++ ER++ 
Sbjct: 773 KF — REEEQLLQEREEERLRRQERERKL 798 

Score » 97 (14.6 bits), Expect - 3.3e+00, P = 9.6e-01 
Identities = 38/129 (29%), Positives - 63/129 (48%) 

Query: 72 ESSRQVTSESQ-- EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL 129 

E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE KLR+ 
Sbjct: 817 ERERKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE LRQERARKLREE — 871 

Query: 130 E DLAREQQRRWVQLEKEQES PRREPEQLGEDVERRI FT PTSRWRDLEKAELSLVPAPSRT 189 
E L R++++ +L +E++ RE EQL E+ + R R L + E L+ 
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Sbjct: 872 EQLLRQEEQ-- -ELRQERDRKLREEEQLLRQEEQEL-- RQERDRKLREEE-QLLQESEEE 925 

Query: 190 QSAHQSRRPHL 200 

+ Q R L 
Sbjct: 926 RLRRQERERKL 936 

Score = 96 (14.4 bits), Expect =» 4.1e+00, P = 9.8e-01 
Identities = 41/132 (31%), Positives = 69/132 (52%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



46 KDKDQEDYFQKGGLQI-KFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEE 104 
+++ QE F + Q+ + ++QL ESQ E + E+ G+ R QL +EE 
473 RERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRWQL QEE 529 

105 MWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERR 164 

++R +A Q QE+LR+ E+L RE++R+ E+E+E E Q ED +RR 
530 AQRRRHTLYAKPGQ — QEQLREE — EELQREKRRQ EREREYREEEKLQREEDEKRR 581 

165 IFTPTSRWRDLEK 177 

++R+LE+ 
582 RQERERQYRELEE 594 



Score - 96 (14.4 bits). Expect = 4.1e+00, P «= 9.8e-01 
Identities - 35/138 (25%), Positives - 76/138 (55%) 



Query: 
Sbjct: 
Que ry : 
Sbjct: 
Query: 
Sbjct: 



28 DRRFPKKWERPVAESL-GHKDKDQEDYFQKGGLQI KFHCSKQLSLESSRQVTS ESQEEPW 8 6 
+R++ + E ELK +++E Q+ + ++ L Q+ + ++E 

586 ERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE-L 644 

87 E E E FG REMRRQLWL EE E EMWQQRQKKW ALLEQEHQE KLRQWNL E DLAREQQRRW VQL 143 

+E R++R + L EE+E+ Q+R++K L +E Q L++ E L R+++ R +L 
645 RQERERKLREEEQLLRREEQELRQERERK LREEEQ-LLQEREEERLRRQERAR — KL 698 

144 EKEQESPRREPEQLGEDVERRI 165 

+E++ R+E ++L + + ER++ 
699 REEEQLLRQEEQELRQERERKL 720 



Score * 95 (14.3 bits), Expect = 5.2e+00, P = 9.9e-01 
Identities * 59/282 (20%), Positives = 121/282 (42%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQI KFHCSKQLSLESSRQVTS 79 

E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 

Sbjct: 655 EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 
Sbjct: 715 ERERKLREEE— QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQESPRREPEQLG-EDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ — S 195 

++ E+EQ RE E+L ++ ER++ ++ E+ L + + Q 

Sbjct: 773 KF — REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKLREEEQLLQ 830 

Query: 196 RRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPS 255 

r -f ++ L ++ + E R R ++ +R+ 

Sbjct: 831 EREEERLRRQERERKLREEEQLLRQE-EQELRQERARKLREEEQLLRQEEQELRQERDRK 889 

Query: 256 LQISPANIKKKVYHMDMEAQRK- — NLQLLSEESELRLPHYLRSKAL 299 

L+ ++++ + E RK QLL E E RL R + L 

Sbjct: 890 LREEEQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKL 936 

Score - 94 (14.1 bits), Expect » l.le+00, P - 6.8e-01 
Identities = 35/116 (30%), Positives - 59/116 (50%) 



Query: 
Sbjct: 
Query: 
Sbjct: 



72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEK L 124 

E +R++ E Q EE+ R+ R + + EEE++ Q+R+++ L QE K L 
977 ERARKLREEEQLLRREEQELRQERDRKFREEEQLLQEREEE-RLRRQERDRKFREEERQL 1035 

125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 
R+ LE+ R+++ R +LE EQ +E +QL R F + R + + E L 

1036 RRQELEEQFRQERDRKFRLE-EQIRQEKEEKQLRRQERDRKFREEEQQRRRQEREQQL 1092 



Score » 94 (14.1 bits), Expect = l.le+00, P - 6.8e-01 
Identities - 51/166 (30%), Positives = 76/166 (45%) 

Query: 67 KQLSLESSRQVTSESQ-- SEPWEEEFGREMR-RQLWLEEEEMWQQRQKKWALLEQEHQEK 123 

++L E R+ E Q +£ EE R+ R R+L EEE++ + Q++ L QE+ 
Sbjct: 1250 QELRRERDRKFRE E EQLLQEREEERL RRQERARKLREEEEQLL FE EQE EQRL RQER 1305 

Query: 124 LRQWNLED-LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R++ E+ ARE++ R +LE+E R+E EQ R F R E+ E 

Sbjct: 1306 DRRYRAEEQFAREEKSR--RLEREL RQEEEQRRRRERERKFREEQLRRQQEE-EQRR 1359 
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Query: 183 VPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVP 232 

R QSRR L P T+Q ARE* R++ P 

Sbjct: 1360 RQLRERQFREDQSRRQVL — EPGTRQFARVPVRSSPLYEYIQEQRSQYRP 1407 

Score - 93 (14.0 bits), Expect = 8.3e+00, P - 1.0e+00 
Identities - 41/145 (28%), Positives - 72/145 (49%) 

Query: 28 DRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW- 86 

+RR ++ER + E + + Q+ + Q+ L R +QE+ + 

Sbjct: 408 ERRQRQERERELEEQARRQQQWQAEEESERRRQ-RLSARPSLRERQLRAEERQEQEQRFR 466 

Query: 87 -EEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE — HQEKLRQWNLEDLAREQQRRWVQ 142 

EEE RE R++L +LEEEE Q+R++ L E++ +++ R+ ++ Q RW Q 
Sbjct: 467 EEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRW-Q 525 

Query: 143 LEKEQESPRR EP EQLGEDVE 162 

L++E + R +P EQL E+ E 

Sbjct: 526 LQEEAQRRRHTLYAKPGQQEQLREEEE 552 

Score = 91 (13.7 bits), Expect » 2.4e+00, P = 9.1e-01 
Identities = 38/110 (34%), Positives = 57/110 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL- 129 

E R++ E Q EE E RE R+L EEE+ + Q+R+++ L QE KLR+ 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEEQL 988 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 180 

++L +E+ R++ E+EQ RE E+L R F R L + EL 

Sbjct: 989 LRREEQELRQERDRKF — REEEQLLQEREEERLRRQERDRKFREEER — QLRRQEL 1040 

Score = 89 (13.4 bits), Expect * 2.2e+00, P - 8.9e-01 
Identities 35/138 (25%), Positives = 65/138 (47%) 

Query: 82 QEEPWEEEFGREMRRQLWLEEEEM--WQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRR 139 

Q E++ E+R + + +E E WQ+++++ L E+E Q K R+ + +R+ + + 
Sbjct: 111 QNRRQEDQRRFELRDRQFEDEPERRRWQKQEQERELAEEEEQRKKRERFEQHYSRQYRDK 170 

Query: 140 WVQLEKEQ-ESPRREPEQL GEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ 194 

+L++++ ERE EQL GDEF +RE+EL Q + 

Sbjct: 171 EQRLQRQELEERRAEEEQLRRRKGRDAEE — FI EEEQLRRREQQELKR-ELREEEQQRRE 227 

Query: 195 SRRPHLPMSPSTQQPALGKQR 215 

R H ++ L ++R 

Sbjct: 228 RREQHERALQEEEEQLLRQRR 248 

Score =50 (7.5 bits). Expect = 2.2e+00, P « 8.9e-01 
Identities = 34/160 (21%), Positives = 67/160 (41%) 

Query: 325 RLQSLRQEAINHVQIMKETEASYKAQNLYIFLENIDRL-QSLRLQAWTDKQKGLEEKHRE 383 

R +• R+E Q+ +E E + + LE +R Q LR + +♦++ E++ R 

Sbjct: 245 RQRRWREEPREQQQLRRELEEI REREQR LEQEERREQQLRREQRLEQEERREQQLRR 301 

Query: 384 CLSSMVTMFPKLQLEWNVHLNIP-EVTSPKPKKCKLPAASPRHIRPSGPTYKQPFLSRHR 442 

L ♦ +L+ E + E + K +L R R ++ L+ 

Sbjct: 302 ELEEIREREQRLEQEERREQRLEQEERREQQLKRELEEIREREQRLEQEERREQLLAEEV 361 

Query: 443 AC V P LQMARQQGKQMEA VWKTE V AS S S YAI EKKT P AS LP RDQ 484 

+ AR++G+ + W+ ++ S + A + K S PR Q 
Sbjct: 362 R EQ ARE RG ES LT RRWQRQL E S EAG ARQS KV - Y S RP RRQ 398 

Score - 40 (6.0 bits), Expect * 1.9e-01, P = 1.7e-01 
Identities = 32/115 (27%), Positives =» 47/115 (40%) 

Query: 27 6 RKNLQLLSEESELRLPHYLRSKAL — ELTTTTMELGALRLQYLCHKYIFYRRL-QSLRQE 332 

R+ QLL E E RL R++ L S E LR Q K+ +L Q +E 

Sbjct: 959 REEEQLLQEREEERLRRQERARKLREEEQLLRREEQELR-QERDRKFREEEQLLQEREEE 1017 

Query 333 AINHVQI MKETEASYKAQNLYI-FLENIDRLQSLRLQAWTDKQ-KGLEEKHRE 383 

+ + +E E + Q L F + DR L Q +K+ K L + R+ 

Sbjct: 1018 RLRRQERDRKFREEERQLRRQELEEQFRQERDRKFRLEEQIRQEKEEKQLRRQERD 1073 

Score - 37 (5.6 bits), Expect « 1.6e+00, P = 7.9e-01 
Identities = 27/108 (25%), Positives = 43/108 (39%) 

Query: 27 6 RKNLQLLSEESELRLPHYLRSKAL ELTTTTMELGALRLQYLCHKYIFYRRLQSLRQE 332 

R+ QLL E E RL R + L E E LR Q K R+LQE 
Sbjct: 775 REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKL REEEQLLQE 831 

Query: 333 AINHVQIMKETEASYKAQKLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRE 383 

+EE ++ +E L+ R + +♦++ L ++ +E 
Sbjct: 832 REEERLRRQERERKLREEEQLLRQEE-QELRQERARKLREEEQLLRQEEQE 881 
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Pedant information for DKFZphtes3_50n23, frame 1 



Report for DKFZphtes3_50n23 . 1 

[LENGTH] 499 

[MM] 58885.69 

[pi] 9.67 

[KW] All Alpha 

[KWJ LO&TCOMPLEXITY 10.42 % 

SEQ MTVRSRVADVFGSKDTESLEPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQ 

SEG 

PRD ccccccceeecccccccccceeeccccccccccccchhhhhhhcccccccccccccccce 

SEQ IKFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEH 

SEG xxxxxxxxxx . . xxxxxxxxxxxxxxxxxxx 

PRD eeeecchhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeccccccchhhhhhhh 

SEQ SLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSAS 

SEG xxxxxxxxxxxxxxx. . . 

PRD hccccccchhhhhccccccccccccccccccccccccceeeeeeccccccccccccceee 

SEQ FPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRKNLQLLSEESELRLPHYLRSKALE 

SEG xxxxxxxx 

PRD ecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LTTTTMELGALRLQYLCHKYIFYRRLQSLRQEAINHVQIMKETEASYKAQNLYIFLENID 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RLQSLRLQAWTDKQKGLEEKHRECLSSMVTMFPKLQLEWNVHLNIPEVTSPKPKKCKLPA 

SEG : • ' 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhccccchhhhhcccccccccccccccccccc 

SEQ ASPRHIRPSGPTYKQPFLSRHRACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASL 

SEG 

PRD ccccccccccccccchhhhhhccchhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccc 

SEQ PRDQLRGHPDI PRLLTLDV 

SEG 

PRD ccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_50n23 . 1) 
(No Pfam data available for DKFZphtes3_50n23 . 1) 
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DKFZphtes3_6b21 



group: testes derived 

DKFZphtes3_6b21 encodes a novel 781 amino acid protein without similarity to human KIAA0256 
gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to KIAA0256 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: /map«"356.3 cR from top of Chr9 linkage group- 
Insert length: 3360 bp 

Poly A stretch at pos. 3314, polyadenylation signal at pos . 3300 



1 GGCAAGCCGA CGGCCCGCTG CTGGCCTCCG TGACGCGGCC TCCTCCGCGC 
51 CTCGCGGCAT GGCGTCGGAG GGGCCGCGGG AGCCCGAAAG CGAGGGCATC 
101 AAGTTATCAG CAGATGTCAA ACCATTTGTC CCCAGATTTG CCGGGCTCAA 
151 TGTGGCATGG TTAGAGTCCT CAGAAGCATG TGTCTTCCCC AGCTCTGCAG 
201 CC AC AT ACT A TCCGTTTGTT CAGGAACCAC CAGTGACAGA AATGTTTACT 
251 CAGTGCCTGG CTCCCAGTAT CTTTATAACC AACCCAGTTG TTACCGAGGT 
301 TTTCAAACAG TGAAGCATCG AAATGAGAAC ACATGCCCTC TCCCACAAGA 
351 AATGAAAGCT CTGTTTAAGA AGAAAACCTA TGATGAGAAA AAAACGTATG 
401 ATCAGCAAAA GTTTGACAGT GAAAGGGCTG ATGGAACTAT ATCATCTGAG 
451 ATAAAATCAG CTAGAGGTTC ACATCATTTG TCCATTTACG CTGAGAATAG 
501 TTTGAAATCA GATGGTTACC ATAAGCGAAC AGACAGGAAA TCCAGAATCA 
551 TTGCAAAAAA TGTATCTACC TCCAAACCTG AGTTTGAATT TACCACACTG 
601 GACTTTCCTG AACTGCAAGG TGCAGAGAAC AATATGTCAG AGATACAGAA 
651 GCAACCCAAG TGGGGACCTG TCCACTCTGT CTCTACCGAC ATTTCTCTTC 
701 TAAGAGAAGT AGTAAAACCA GCTGCAGTGT TATCAAAGGG TGAAATAGTG 
7 51 GTGAAAAATA ACCCAAATGA ATCTGTAACT GCTAATGCCG CTACCAATTC 
801 TCCTTCATGT ACAAGAGAGT TATCTTGGAC ACCAATGGGT TATGTTGTTC 
851 GACAGACATT ATCTACAGAA CTGTCAGCAG CCCCTAAAAA TGTTACTTCT 
901 ATGATAAACT TAAAGACCAT TGCTTCATCA GCAGATCCTA AAAATGTTAG 
951 TATACCATCT TCTGAAGCTT TATCTTCGGA TCCTTCCTAC AACAAAGAAA 
1001 AACACATTAT TCATCCTACC CAAAAGTCTA AAGCATCACA AGGTAGTGAC 
1051 CTTGAACAAA ATGAAGCCTC AAGAAAGAAT AAGAAAAAGA AAGAAAAATC 
1101 TACATCAAAA TATGAAGTCC TGACAGTTCA AGAGCCTCCA AGGATTGAAG 
1151 ATGCCGAGGA ATTTCCCAAC CTGGCAGTTG CATCTGAAAG AAGAGACAGA 
1201 ATAGAGACAC CGAAATTTCA ATCTAAGCAG CAGCCACAGG ATAATTTTAA 
1251 AAATAATGTA AAGAAGAGCC AGCTTCCAGT GCAGTTGGAC TTGGGGGGCA 
1301 TGCTGACAGC CCTGGAGAAG AAGCAGCACT CTCAGCATGC AAAGCAGTCC 
1351 TCCAAACCAG TGGTAGTCTC AGTTGGAGCA GTGCCAGTCC TTTCCAAAGA 
1401 ATGTGCATCA GGGGAGAGAG GCCGCCGCAT GAGTCAAATG AAGACCCCGC 
14 51 ACAATCCCTT GGACTCCAGC GCCCCACTGA TGAAGAAAGG GAAGCAGAGG 
1501 GAGATCCCCA AGGCCAAGAA GCCAACCTCA CTGAAGAAGA TTATTTTGAA 
1551 AGAACGGCAA GAGAGAAAGC AGCGTCTCCA AGAAAATGCT GTGAGTCCAG 
1601 CTTTTACCAG TGATGACACA CAAGATGGAG AGAGTGGTGG TGATGACCAG 
1651 TTTCCCGAGC AGGCAGAGCT GTCAGGGCCA GAGGGGATGG ACGAACTGAT 
1701 CTCCACTCCT TCGGTTGAGG ACAAGTCTGA AGAGCCACCA GGCACAGAGC 
1751 TCCAGAGGGA CACAGAGGCC TCCCACCTTG CTCCCAATCA CACCACCTTC 
1801 CCTAAGATCC ACAGCCGCAG ATTCAGGGAT TACTGCAGCC AGATGCTTAG 
1851 TAAAGAAGTG GATGCTTGTG TTACCGACCT ACTCAAAGAA CTGGTCCGTT 
1901 TCCAAGACCG TATGTACCAG AAAGATCCAG TCAAGGCCAA GACTAAACGT 
1951 CGACTTGTGT TGGGGTTGAG GGAGGTTCTC AAACACCTGA AGCTCAAAAA 
2001 ACTGAAATGT GTCATTATTT CTCCCAACTG TGAGAAGATA CAGTCAAAAG 
2051 GTGGGCTGGA TGACACTTTG CACACAATTA TTGATTATGC CTGTGAGCAG 
2101 AACATTCCCT TTGTGTTTGC TCTCAACCGC AAAGCTCTGG GGCGCAGTTT 
2151 GAATAAGGCA GTTCCTGTCA GTGTGGTGGG GATCTTCAGC TATGATGGGG 
2201 CCCAGGATCA GTTCCACAAG ATGGTTGAGC TGACAGTGGC GGCCCGACAG 
2251 GCGTACAAGA CCATGCTGGA GAATGTGCAG CAGGAGCTGG TGGGAGAGCC 
2301 CAGGCCTCAG GCACCTCCCA GCCTACCCAC ACAGGGCCCC AGCTGCCCTG 
2351 CAGAAGATGG CCCCCCAGCC CTGAAAGAAA AAGAAGAGCC ACACTACATT 
2401 GAAATCTGGA AAAAACATCT GGAAGCATAC AGTGGATGTA CCCTGGAGCT 
24 51 AGAAGAATCC TTGGAGGCTT CAACCTCTCA AATGATGAAT TTGAATTTAT 
2501 GAGAGTTCTT GCCTGTGTGT CTGTATTTTG GGTAAGGAGG GGAGGTCTGA 
2551 AAAAGACTTT GGGGCTTTTT CTTCTGTTTT TCATGACAAT GTAATTTGTG 
2601 TAACTGTTGA ATCTGGAAAT TGATCAGCAT TAAAGGGCAC ATGAAGCAGT 
2651 GTCTGCAGGC GTTCAGTGCT GCGGAGCCTG TTAAAGGTCA CTCAGATGTG 
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2701 CAGGTGTTAA TCTTCTCTAA AAGCCTGGTT ATACAGCTCT GGCTTTCTGA 
2751 GCACACTACG GATCTGGAAA ATACTGGAAA ATGTGATACT TAGAATACTT 
2801 TGGCTGCTAA GGAAACTTCC TCTCCATTGC AGAATAGCTG AGCCAAGTGA 
2851 GTGAGTTTGC AGAAAGCAGG TGGTGAGCTC CTGCCTGCTG GAGGTTGCCA 
2901 TGGAGGGCCA TTCCTGCCCG GCAACAGCAC CGTCCTGCAG GGAGCCACTT 
2951 GGCAGAAGGG TGCAGGGCTG CTGGTGTCAG AGCAAGAGGG CTACAGGGAA 
3001 AGGGCCCTTT CTCAGGGGAT GTAGCTTTTT TAAAAGATTT GGGAACACTT 
3051 GGAGGATTTG CTAAAATGAG CCTCAGAAGG AAAATTGGTT TTCTAACCTG 
3101 TGACTTTTTG AAATGAATTA TTCCTTTCAG TCTTTATTTT TCAAAGAAAC 
3151 AATGTGTATT GAAGTACCTA GATTTGTTTG ATAATCAACA AATCTTTCCT 
3201 TTTTCAATGA ACATATTCTG AATGTGGTTT CTGTCTTAGA CCAGGAGGAC 
3251 AGAGTTTGCT TTCATATTTT CCCTGTAAGT AAGAGGGCTT ATTTATTTTA 
3301 AATAAAGAGT AATTATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAAAAAAAAA 



BLAST Results 



Entry HS773347 from database EMBL: 
human STS WI-18160. 
Score = 813, P = 2.9e-30, identities = 167/171 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 157 bp to 2499 bp; peptide length: 781 
Category: similarity to known protein 



1 MVRVLRSMCL PQLCSHILSV CSGTTSDRNV YSVPGSQYLY NQPSCYRGFQ 
51 TVKHRNENTC PLPQEMKALF KKKTYDEKKT YDQQKFDSER ADGTISSEIK 
101 SARGSHHLSI YAENSLKSDG YHKRTDRKSR IIAKNVSTSK PEFEFTTLDF 
151 PELQGAENNM SEIQKQPKWG PVHSVSTDIS LLREWKPAA VLSKGEIVVK 
201 NNPNESVTAN AATNSPSCTR ELSWTPMGYV VRQTLSTELS AAPKNVTSMI 
251 NLKTIASSAD PKNVSIPSSE ALSSDPSYNK EKHIIHPTQK SKASQGSDLE 
301 QNEASRKNKK KKEKSTSKYE VLTVQEPPRI EDAEEFPNLA VASERRDRIE 
351 TPKFQSKQQP QDNFKNNVKK SQLPVQLOLG GMLTALEKKQ HSQHAKQSSK 
401 PWVSVGAVP VLSKECASGE RGRRMSQMKT PHNPLDSSAP LMKKGKQREI 
451 PKAKKPTSLK KIILKERQER KQRLQENAVS PAFTSDDTQD GESGGDDQFP 
501 EQAELSGPEG MDELISTPSV EDKSEEPPGT ELQRDTEASH LAPNHTTFPK 
551 IHSRRFRDYC SQMLSKEVDA CVTDLLKELV RFQDRMYQKD PVKAKTKRRL 
601 VLGLREVLKH LKLKKLKCVI ISPNCEKIQS KGGLDDTLHT I1DYACEQNI 
651 PFVFALNRKA LGRSLNKAVP VSWGIFSYD GAQDQFHKMV ELTVAARQAY 
701 KTMLENVQQE LVGEPRPQAP PSLPTQGPSC PAEDGPPALK EKEEPHYIEI 
751 WKKHLEAYSG CTLELEESLE ASTSQMMNLN L 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_6b21, frame 1 

SWISSPROT:Y256 HUMAN HYPOTHETICAL PROTEIN KIAA0256., N => 1, Score = 
786, P = 3.6e-78 

TREMBL : PFMAL3P3_15 gene: "MAL3P3 . 15" ; Plasmodium falciparum MAL3P3, N 

- 2, Score - 161, P = 5.1e-10 

TREMBL :RNNFLH_1 Rat heavy neurofilament subunit <NF-H) mRNA, 3' end., N 

- 1, Score - 150, P =■ 9.1e-07 



>SWISSPROT:Y256 HUMAN HYPOTHETICAL PROTEIN KIAA0256. 
Length - 635 

HSPs: 

Score » 786 (117.9 bits), Expect = 3.6e-78, P =» 3.6e-78 
Identities = 190/424 (44%), Positives - 263/424 (62%) 
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Query: 369 KKSQLPVQLDLGGMLTALEKKQHSQHAKQ--SSKPVVVSVGAVPVLSKECASGERGRRMS 426 

KK++ PVQLDLG ML ALEK+Q + A+Q ++ + P+ +V + ++ + S 

SbjCt: 16 KKNKTPVQLDLGDMLAALEKQQQAMKARQITNTRPLSYTVVTAASFHTKDSTNRKPLTKS 75 

Query: 427 Q-MKTPHNPLDSSAPI*MKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVSPAFTS 485 

Q T N +D ++ KKGK++EI K K+PT+LKK+ILKER+E+K RL + S 
Sbjct: 76 QPCLTSFNSVDIASSKAKKGKEKEIAKLKRPTALKKVILKEREEKKGRLTVD — HNLLGS 133 

Query: 486 DDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPG — TELQRDTEASHL — 541 

+ + + D P+ + G+ + S S+ S+ P T + + + AS 

Sbjct: 134 EEPTEMHLDFIDDLPQEIVSQEDTGLS-MPSDTSLSPASQNSPYCMTPVSQGSPASSGIG 192 

Query: 542 APN-HTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 600 

+P +T KIHS+RFR+YC+Q+L KE+D CVT LL+ELV FQ+R+YQKDPV+AK +RRL 
Sbjct: 193 SPMASSTITKIHSKRFREYCNQVLCKEIDECVTLLLQELVSFQERIYQKDPVRAKARRRL 252 

Query: 601 VLGLREVLKHLKLKKLKCVI I SPNCEKIQSKGGLDDTLHTI I DYACEQNI PFVFALNRKA 660 

V+GLREV KH + KL K+KCVI I S PNCEKIQSKGGLD+ L+ +1 A EQ IPFVFAL RKA 
Sbjct: 253 VMGLREVTKHMKLNKIKCVIISPNCEKIQSKGGLDEALYNVIAMAREQEIPFVFALGRKA 312 

Query: 661 LGRSLNKAVPVSWGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRP 717 

LGR +NK VPVSWGIF+Y GA+ F+K+VELT AR+AYK M+ ++QE E 
Sbjct: 313 LGRCVNKLVPVSWGIFNYFGAESLFNKLVELTEEARKAYKDMVAAMEQEQAEEALKNVK 372 

Query: 718 QAPPSLP-TQGPS CPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTL ELE 766 

+ P + ++ PS C P+EEYW++EG EE 

Sbjct: 373 KVPHHMGHSRNPSAASAISFCSVISEP— ISEVNEKEYETNWRNMVETSDGLEASENEKE 430 

Query: 767 ESLEASTSQ 775 

S + STS+ 
Sbjct: 431 VSCKHSTSE 439 

Pedant information for DKFZphtes3_6b21, frame 1 

Report for DKFZphtes3_6b21 . 1 



( LENGTH ] 

[MW] 

ipll 

(HOMOL] 

I PROS I TE) 

(PROSITE] 

[PROSITEJ 

[PROSITE] 

[PROSITE] 

(PROSITEJ 

(PROSITE) 

IKW] 

(KWj 



781 

87393.44 
8.94 

SWISSPROT:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256. 

MYRISTYL 4 

AMIDATION 1 

CAMP PHOSPHORS I TE 3 

CK2 PHOSPHO SITE 16 

TYR~PHOSPHO~SITE 4 

PKC_PHOSPHO SITE 16 

ASN_G L YC OS YLAT I ON 6 

Alpha_Beta 

LOW COMPLEXITY 8.45 % 



4e-75 



SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MVRVLRSMCLPQLCSHILSVCSGTTSDRNVYSVPGSQYLYNQPSCYRGFQTVKHRNENTC 

ccceeeeeccceeeeeeeeeeccccccccccccccccccccccceeeceeeeeecccccc 

PLPQEMKALFKKKTYDEKKTYDQQKFDSERADGTISSEIKSARGSHHLSI YAENSLKSDG 

xxxxxxxxxxxx 

cccchhhhhhhhhhccchhhhhhhhhhhccccccchhhhhhhcccceeeeeeeecccccc 

YHKRTDRKSRIIAKNVSTSKPEFEFTTLDFPELQGAENNMSEIQKQPKWGPVHSVSTDIS 

cccccchhhhheeeccccccccceeecccccccccccchhhhhhccccccccceeecchh 

LLREWKPAAVLSKGEIWKNNPNESVTANAATNSPSCTRELSWTPMGYWRQTLSTELS 

hhhhhhheeeeecccceeeeccccceeeeeecccccccceeeeeccceeeeeeccccccc 

AAPKNVTSMINLKTIASSADPKNVSIPSSEALSSDPSYNKEKHIIHPTQKSKASQGSDLE 

ccccceeeeehhhhhhcccccceeeecccccccccccccccceeechhhhhhhcccccch 

QNEASRKNKKKKEKSTSKYEVLTVQEPPRIEDAEEFPNLAVASERRDRIETPKFQSKQQP 

. . . .xxxxxxxxxxxxxx 

hhhhccccccccccccceeeeeecccccchhhhhhccchhhhhhhhhhhhcccccccccc 

QDNFKNNVKKSQLPVQLDLGGMLTALEKKQHSQHAKQSSKPWVSVGAVPVLSKECASGE 

xxxxxxxxxxxxxxxxx 

cccccccccccccccccccchhhhhhhhhhhhhhhhhhhccceeeeeeeeeeeecccccc 
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SEQ RGRRMSQMKTPHNPLDSSAPLMKKGKQREI PKAKKPTSLKKI ILKERQERKQRLQENAVS 



PRD chhhhhhcccccccccccccchhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ PAFTSDDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPGTELQRDTEASH 

SEG 

PRD ccccccccccccccccccchhhhhhcccccceeeeccccccccccccccccccccccccc 

SEQ LAPNHTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 



PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhh 

SEQ VLGLREVLKHLKLKKLKC VI I S PNCEKI QSKGGLDDTLHTI I DY ACEQNI P FV FALNRKA 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhcccceeeeccccc 

SEQ LGRSLNKAVPVSVVGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRPQAP 

SEG 

PRD cccccccceeeeeeeeecccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ PSLPTQGPSCPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTLELEESLEASTSQMMNLN 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhcccceeeehhhhhhhhhchhhhhhhhhhhhhhhccccc 

SEQ L 
SEG 

PRD c 



Prosite for DKFZphtes3_6b21 . 1 



PS00001 


135->139 


PS00001 


159->163 


PS00001 


204->208 


PS00001 


245->249 


PSO0O01 


263->267 


PS00001 


S44->548 


PSO0OO4 


71->75 


PS00004 


423->427 


PS00004 


454->458 


PS000O5 


26->29 


PS00005 


Sl->54 


PS00Q05 


88->91 


PS00005 


101->104 


PS00005 


115->118 


PS00005 


125->128 


PS00005 


138->141 


PS00005 


288->29X 


PS00005 


305->308 


PS00005 


316->319 


PS00005 


343->346 


PS00005 


351->354 


PS00005 


398->401 


PS00005 


453->461 


PS000O5 


553->556 


PS00005 


596->599 


PS00006 


24->28 


PS00006 


74->78 


PS00006 


139->143 


PS00006 


146->150 


PS0OO06 


193->197 


PS00006 


257->261 


PS00006 


297->301 


PS00006 


317->321 


PS00006 


323->327 


PS00006 


384->388 


PS00006 


484->488 


PSOO0O6 


493->497 


PS00006 


506->510 


PS00006 


519->523 


PS00006 


640->644 


PS00006 


7O2->706 


PS00007 


581->588 


PS00007 


740->748 


PS00007 


740->748 


PS00007 


73->82 


PS00008 


93->99 


PS00008 


155->161 


PS00008 


380->386 



ASN GLYCOSYLATION 

ASNJ3LYCOSYLATION 

A S N_G L YCOS Y L AT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO SITE 

CAMP PHOSPHORITE 

CAMP*" PHOSPHORITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOS PHO~S I TE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SIT£ 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO~SITE 

CK2 PHOSPHORITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

tyr__phospho~site 

tyr phosphorite 

tyr~phospho_site 

tyr~phospho_site 

myristyl 

myristyl 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 



911 



WO 01/12659 



PCT/IB00/01496 



PS00008 633->639 MYRISTYL PDOC00008 

PSO0009 421->425 AMIDATION PDOC00009 



(No Pfam data available for DKFZphtes3_6b21 . 1) 
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DKFZphtes3_6cll 



group: signal transduction 

DKFZphtes3_6cll encodes a novel 1025 amino acid protein with similarity to A. ambisexualis 
antheridioT steroid receptor. 

The novel protein is a putative steroid receptor. It shares similarity with yeast YNL132w and 
contains the ATP/GTP-binding site motif A (P-loop) and RGD site, similar to the A. 
ambisexualis antheridiol steroid receptor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this receptor. 



strong similarity to YNLl32w 

strong similarity to S . pombe/YDK9_SCHPO, S . cerevisiae/YNL132w, 
C.elegans/F55A12.8 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3966 bp 

Poly A stretch at pos. 3890, polyadenylation signal at pos. 3873 



1 GCTGTGCCTT CTCTTTCGGA GTTGTTCCGT GCTCCCACGT GCTTCCCCTT 
51 CTCCACTGGC TGGGATCCCC CGGGCTCGGG GCGCAGTAAT AATTTTTCAC 
101 CATGCATCGG AAAAAGGTGG ATAACCGAAT CCGGATTCTC ATTGAGAATG 
151 GAGTAGCTGA GCGGCAAAGA TCTCTCTTTG TTGTAGTTGG GGATCGAGGA 
201 AAAGATCAGG TGGTAATACT TCATCACATG TTATCCAAAG CAACTGTGAA 
251 GGCTCGGCCT TCAGTGCTGT GGTGTTATAA GAAAGAGCTG GGGTTTAGCA 
301 GTCACCGGAA GAAAAGAATG CGACAGCTGC AGAAGAAAAT AAAGAATGGA 
351 ACACTGAACA TAAAGCAGGA CGACCCCTTT GAACTCTTCA TAGCAGCCAC 
401 AAACATTCGC TACTGCTACT ACAACGAGAC CCACAAGATC CTGGGCAATA 
4 51 CCTTCGGCAT GTGTGTGCTG CAGGATTTTG AAGCCTTAAC TCCAAACTTG 
501 CTGGCCAGGA CTGTAGAAAC AGTGGAAGGT GGTGGGCTAG TGGTCATCCT 
551 CCTACGGACC ATGAACTCAC TCAAGCAATT GTACACAGTG ACTATGGATG 
601 TGCATTCCAG GTACAGAACT GAGGCCCATC AGGATGTGGT GGGAAGATTT 
651 AATGAAAGGT TTATTCTGTC TCTGGCCTCT TGTAAGAAGT GTCTCGTCAT 
701 TGATGACCAG CTCAACATCC TGCCCATCTC CTCCCACGTT GCCACCATGG 
7 51 AGGCCCTGCC TCCCCAGACT CCGGATGAGA GTCTTGGTCC TTCTGATCTG 
801 GAGCTGAGGG AGTTGAAGGA GAGCTTGCAG GACACCCAGC CTGTGGGTGT 
851 GTTGGTGGAC TGCTGTAAGA CTCTAGACCA GGCCAAAGCT GTCTTGAAAT 
901 TTATCGAGGG CATCTCTGAA AAGACCCTGA GGAGTACTGT TGCACTCACA 
951 GCTGCTCGAG GACGGGGAAA ATCTGCAGCC CTGGGATTGG CGATTGCTGG 
1001 GGCGGTGGCA TTTGGGTACT CCAATATCTT TGTTACCTCC CCAAGCCCTG 
1051 ATAACCTCCA TACTCTGTTT GAATTTGTAT TTAAAGGATT TGATGCTCTG 
1101 CAATATCAGG AACATCTGGA TTATGAGATT ATCCAGTCTC TAAATCCTGA 
1151 ATTTAACAAA GCAGTGATCA GAGTGAATGT ATTTCGAGAA CACAGGCAGA 
1201 CTATTCAGTA TATACATCCT GCAGATGCTG TGAAGCTGGG CCAGGCTGAA 
1251 CTAGTTGTGA TTGATGAAGC TGCCGCCATC CCCCTCCCCT TGGTGAAGAG 
1301 CCTACTTGGC CCCTACCTTG TTTTCATGGC ATCCACCATC AATGGCTATG 
1351 AGGGCACTGG CCGGTCACTG TCCCTCAAGC TAATTCAGCA GCTCCGTCAA 
1401 CAGAGCGCCC AGAGCCAGGT CAGCACCACT GCTGAGAATA AGACCACGAC 
1451 GACAGCCAGA TTGGCATCAG CGCGGACACT GCATGAGGTT TCCCTCCAGG 
1501 AGTCAATCCG ATACGCCCCT GGGGATGCAG TGGAGAAGTG GCTGAATGAC 
1551 TTGCTGTGCC TGGATTGCCT CAACATCACT CGGATAGTCT CAGGCTGCCC 
1601 CTTGCCTGAA GCTTGTGAAC TGTACTATGT TAATAGAGAT ACCCTCTTTT 
1651 GCTACCACAA GGCCTCTGAA GTTTTCCTCC AACGGCTTAT GGCCCTCTAC 
1701 GTGGCTTCTC ACTACAAGAA CTCTCCCAAT GATCTCCAGA TGCTCTCCGA 
17 51 TGCACCTGCT CACCATCTCT TCTGCCTTCT GCCTCCTGTG CCCCCCACCC 
1801 AGAATGCCCT TCCAGAAGTG CTTGCTGTTA TCCAGGTGTC CCTTGAAGGG 
1851 GAGATTTCTC GCCAGTCCAT CTTGAACAGT CTGTCTCGAG GCAAGAAGGC 
1901 TTCAGGGGAC CTGATTCCAT GGACAGTGTC AGAACAGTTC CAAGATCCAG 
1951 ACTTTGGTGG TCTGTCTGGT GGAAGGGTCG TTCGCATTGC TGTTCACCCA 
2001 GATTATCAAG GGATGGGCTA TGGCAGCCGT GCTCTGCAGC TGCTGCAGAT 
2051 GTACTATGAA GGCAGGTTTC CTTGTCTGGA GGAAAAGGTC CTTGAGACAC 
2101 CACAGGAAAT TCACACCGTA AGCAGCGAGG CTGTCAGCTT GTTGGAAGAG 
2151 GTCATCACTC CCCGGAAGGA CCTGCCTCCT TTACTCCTCA AATTGAATGA 
2201 GAGGCCTGCC GAACGCCTGG ATTACCTGGG TGTTTCCTAT GGCTTGACCC 
2251 CCAGGCTCCT CAAGTTCTGG AAACGAGCTG GATTTGTTCC TGTTTATCTG 
2301 AGACAGACCC CGAATGACCT GACCGGAGAG CACTCGTGCA TCATGCTGAA 
2351 GACGCTCACT GATGAGGATG AGGCTGACCA GGGAGGCTGG CTTGCAGCCT 
2401 TCTGGAAAGA TTTCCGACGG CGGTTCCTAG CCTTGCTCTC CTACCAGTTC 
24 51 AGTACCTTCT CTCCTTCCCT GGCTCTGAAC ATCATTCAGA ACAGGAACAT 
2501 GGGGAAGCCA GCCCAGCCTG CCCTGAGCCG GGAGGAGCTG GAAGCACTCT 
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2551 TCCTCCCCTA TGACCTGAAG CGGCTGGAGA TGTATTCACG GAATATGGTG 
2601 GACTATCACC TCATCATGGA CATGATCCCG GCCATCTCTC GCATCTATTT 
2651 CCTGAACCAG CTGGGGGACC TGGCCCTGTC TGCGGCTCAG TCGGCTCTTC 
2701 TCTTGGGGAT TGGCCTGCAG CATAAGTCTG TGGACCAGCT GGAAAAGGAG 
2751 ATTGAGCTGC CCTCGGGCCA GTTGATGGGA CTTTTCAACC GGATCATCCG 
2801 CAAAGTTGTG AAGCTATTTA ATGAAGTTCA GGAAAAGGCC ATTGAGGAGC 
2851 AGATGGTGGC AGCGAAGGAT GTGGTCATGG AGCCCACGAT GAAGACCCTC 
2901 AGTGACGACC TAGATGAAGC AGCAAAGGAA TTTCAGGAGA AACACAAGAA 
2951 GGAAGTAGGG AAGCTGAAGA GCATGGACCT CTCTGAATAC ATAATCCGTG 
3001 GGGACGATGA AGAGTGGAAT GAAGTTTTGA ACAAAGCTGG GCCGAACGCC 
3051 TCGATCATCA GCCTGAAAAG TGACAAGAAA AGGAAGTTAG AGGCCAAACA 
3101 AGAACCCAAA CAGAGCAAGA AGTTGAAGAA CAGAGAGACA AAGAACAAAA 
3151 AAGATATGAA ACTGAAGCGG AAGAAATAGT GAAGAGAAAC TCGGGCATCT 
3201 GTGTTTGATC ATGGGAAGAT ACTCTCACTA ACTGAACCCT CTCTGGCTGG 
3251 ACTGTTAAAA GCAACGAGAG GCCCCGGCAC ACCTGGAAGC TGGCCGCGAA 
3301 TTCGGCCTCT GGGCCTGTGT GTCTGTGAGC TCAACCTGGC TAAAGGCAGA 
3351 GTCACTCCCA AATGGGTCTC TTTAGAACTT GATGGCTGGG CACTGCCATC 
3401 TCTAGAATTG CCACGAGTCT CTCTCTTCCT GCCCAGTCCA GGGCCCTCCT 
3451 TTCCTATAAG TTCATATTTT GCTTTGAGCC AGCTTTTTAG TCTCATTCCC 
3501 ACACATGTGG AAGCCACGTT GCCTCTCGAC CGCCTGAGGC CCTTAAGTAC 
3551 ATCGCTTTCT GGTGGTGCCC AGGAGGCTGC TGCTGGGCCG CTGGGTCTCT 
3601 CTTTGTGGAC TTGTACCTGG AGCAGGAGGA ACTCCAGTCC GTCCCGGCAT 
3651 CCATGGCAGC CCGCGGTTAG GTGCGCCAGG GTTTGCTGAT GTTGTCTTGT 
3701 GCTGTTCCAC TCTTGGCTCC AGCAGACCCA CTGTCCCAGA AAAGCCTGAT 
3751 CCTGTAGTTT ATGTAGAATG CCACATCTGC GTCCTCAAGA CCTGTTTCAT 
3801 CCATTTGGGA AAAGATGTTG GGAAAGGCCA CTTTGCTCGC AGGGGTGAGG 
38 51 GGAAGGATAG AGAATCTATT TTTAATAAAT AACATTCTAG AATGAAAAAA 
3901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3951 AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 102 bp to 3176 bp; peptide length: 1025 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (966-969) 
ATP GTP_A (284-292) 



1 MHRKKVDNRI RILIENGVAE RQRSLFVVVG DRGKDQVVIL HHMLSKATVK 
51 ARPSVLWCYK KELGFSSHRK KRMRQLQKKI KNGTLNIKQD DPFELFIAAT 
101 NIRYCYYNET HKILGNTFGM CVLQDFEALT PNLLARTVET VEGGGLWIL 
151 LRTMNSLKQL YTVTMDVHSR YRTEAHQDW GRFNERFILS LASCKKCLVI 
201 DDQLNILPIS SHVATMEALP PQTPDESLGP SDLELRELKE SLQDTQPVGV 
251 LVDCCKTLDQ AKAVLKFIEG ISEKTLRSTV ALTAARGRGK SAALGLAIAG 
301 AVAFGYSNIF VTSPSPDNLH TLFEFVFKGF DALQYQEHLD YEIIQSLNPE 
351 FNKAVIRVNV FREHRQTIQY IHPADAVKLG QAELWIDEA AAIPLPLVKS 
401 LLGPYLVFMA STINGYEGTG RSLSLKLIQQ LRQQSAQSQV STTAENKTTT 
451 TARLASARTL HEVSLQESIR YAPGDAVEKW LNDLLCLDCL NITRIVSGCP 
501 LPEACELYYV NRDTLFCYHK ASEVFLQRLM ALYVASHYKN SPNDLQMLSD 
551 APAHHLFCLL PPVPPTQNAL PEVLAVIQVC LEGEISRQSI LNSLSRGKKA 
601 SGDLIPWTVS EQFQDPDFGG LSGGRWRIA VHPDYQGMGY GSRALQLLQM 
651 YYEGRFPCLE EKVLETPQEI HTVSSEAVSL LEEVITPRKD LPPLLLKLNE 
701 RPAERLDYLG VSYGLTPRLL KFWKRAGFVP VYLRQTPNDL TGEHSCIMLK 
751 TLTDEDEADQ GGWLAAFWKD FRRRFLALLS YQFSTFSPSL ALNIIQNRNM 
801 GKPAQPALSR EELEALFLPY OLKRLEMYSR NMVDYHLIMD MIPAISRIYF 
851 LNQLGDLALS AAQSALLLGI GLQHKSVDQL EKEIELPSGQ LMGLFNRIIR 
901 KWKLFNEVQ EKAIEEQMVA AKDWMEPTM KTLSDOLDEA AKEFQEKHKK 
951 EVGKLKSMDL SEYIIRGDDE EWNEVLNKAG PNASIISLKS DKKRKLEAKQ 
1001 EPKQSKKLKN RETKNKKDMK LKRKK 

BLAST P hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_6cll, frame 3 

TREMBL : CEAF3 1 30_4 gene: "F55A12.8"; Caenorhabditis elegans cosmid 
F55A12., N = 1, Score - 2782, P « l.le-289 

PIR:S55151 probable membrane protein YNL132w - yeast (Saccharomyces 
cerevisiae), N » 2, Score = 2549, P * 3.5e-273 

SWISSPROT:YXXl_ACHAM HYPOTHETICAL PROTEIN (FRAGMENT) . , N - 1, Score - 
1013, P = 3.2e-102 

SWISSPROT: YDK9_SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN 
CHROMOSOME I. f N • 1, Score - 2843, P - 3.8e-296 



>SWISSPROT: YDK9_SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN CHROMOSOME 
I. 

Length =« 1,033 

HSPs: 

Score * 2843 (426.6 bits), Expect = 3.8e-296, P « 3.8e-296 
Identities - 576/1033 (55%), Positives = 750/1033 (72%) 

Query: 1 MHRKKVDNRIRILIENGVAERQRSLFVWGDRGKDQWILHHMLSKATVKARPSVLWCYK 60 

M +K +D+RI LI+NG 'E+QRS FVWGDR + DQW LH +LS++ V ARP+VLW YK 
Sbjct: 1 MPKKALDSRIPTLIKNGCQEKQRSFFVVVGDRARDQVVNLHWLLSQSKVAARPNVLWMYK 60 

Query: 61 KEL-GFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFG 119 

K+L GF+SHRKKR +++K+IK G + +DPFELF + TNIRYCYY E+ KILG T+G 
Sbjct: 61 KDLLGFTSHRKKRENKIKKEIKRGIRDPNSEDPFELFCSITNIRYCYYKESEKILGQTYG 120 

Query: 120 MCVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDV 179 

M VLQDFEALT PNLLART + ETVEGGG+V V+LL +NSLKQLYT++MD+HSRYRTEAH DV 
Sbjct: 121 ML VLQDFEALT PNLLART I ETVEGGG I VVLLLHKLNSLKQLYTMSMDIHSRYRTE1AHSDV 180 

Query: 180 VGRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELK 239 

RFNERFILSL +C+ CLVIDD+LN+LPIS ++ALPP +++ + ++EL+ 

Sbjct: 181 T ARFNERF I LS LGNCENCLV I DDELN VLP I S GG - KN VKALP PTL EEDN - - S TQN S I KELQ 237 

Query: 240 ESLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIA 299 

ESL + P G LV KTLDQA+AVL F+E I EK+L+ TV+LTA RGRGKSAALGLAIA 
Sbjct: 238 ES LG E DH P AG AL VG VT KT L DQARA VLT FV E S I V E KS LKGT V S LT AG RGRG KS AALGLA I A 297 

Query: 300 GAVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVN 359 

A+A GYSNIF+TSPSP+NL TLFEF+FKGFDAL Y+EH+DY+IIQS NP ++ A++RVN 
Sbjct: 298 AAIAHGYSNIFITSPSPENLKTLFEFIFKGFDALNYEEHVDYDIIQSTNPAYHNAIVRVN 357 

Query: 360 VFREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGT 419 

+FR+HRQTIQYI P D+ LGQAELVVI DEAAAI PLPLV+ L+GPYLVFMASTINGYEGT 
Sbjct: 358 I FRDHRQTIQYISPEDSNVLGQAELWI DEAAAI PLPLVRKLIGPYLVFMASTINGYEGT 417 

Query: 420 GRSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEK 479 

GRSLSLKL+QQLR+QS S + NK+ + + + S RTL E+SL E IRYA GD +E 

Sbjct: 418 GRSLSLKLLQQLREQSRI — YSGSGNNKSDSQSHI-SGRTLKEISLDEPIRYAMGDRIEL 474 

Query: 4 80 WLNDLLCLDCLN-ITRIVS-GCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASH 537 

WLN LLCLD + ++R+ + G P P C LY V+RDTLF YH SE FLQR+M+LYVASH 
Sbjct: 475 WLNKLLCLDAASYVSRMATQGFPHPSECSLYRVSRDTLFSYHPISEAFLQRMMSLYVASH 534 

Query: 538 Y KN S PN DLQML SDAPAHHLFCLLPPVPPTQNALP E VL A V I QVC LEG E I S RQ S I LN S L S RG 597 

YKNS PNDLQ++ S DAPAH LF LLPPV LP+ + VIQ+ LEG ISR+SI+NSLSRG 

Sbjct: 535 YKNSPNDLQLMSDAPAHQLFVLLPPVDLKNPKLPDPICVIQLALEGSISRESIMNSLSRG 594 

Query: 598 KKASGDLIPWTVSEQFQDPDFGGLSGGRWRIAVHPDYQGMGYGSRALQLLQMYYEGRFP 657 

++A GDLIPW +S+QFQD +F L G R+VRIAV P++ MGYG+RA+QLL Y+EG+F 
Sbjct: 595 QRAGGDLIPWLISQQFQDENFAALGGARIVRIAVSPEHVKMGYGTRAMQLLHEYFEGKFI 654 

Query: 658 CLEEKVLETPQEIHTVSSEAV SLLEEVITPR — KDLPPLLLKLNERPAERLDYLGVS 712 

E+ ++E+ +LEIRK +PPLLLKL+E E L Y+GVS 

Sbjct: 655 SASEEFKAVKHSLKRIGDEEIENTALQTEKIHVRDAKTMPPLLLKLSELQPEPLHYVGVS 714 

Query: 713 YGLTPRLLKFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFR 772 

YGLTP L KFWKR G+ P+YLRQT NDLTGEH+C+ML+ L D WL AF ++F 

Sbjct: 715 YGLT PSLQKFWKREGYCPLYLRQTAN DLTGEHTC VMLRVLEGRDS E WLGAFAQNFY 770 

Query: 773 RRFLALLS YQFSTFSPSLALNI IQNRNMGKP AQPALSREELEALFLPYDLKRLEMY 828 

RRFL+LL YQF F+ AL+++ N G * L+ EE+ +F YDLKRLE Y 

Sbjct: 771 RRFLSLLGYQFREFAAITALSVLDACNNGTKYVVNSTSKLTNEEINNVFESYDLKRLESY 830 
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Query: 829 SRNMVDYHLIMDMIPAISRIYFLNQLGD-IALSAAQSALLLGIGLQHKSVDQLEKEIELP 887 

S N++DYH+I+D++P ++ +YF + D + LS Q ++LL +GLQ+K++D LEKE LP 
Sbjct: 831 SNNLLDYHVIVDLLPKLAHLYFSGKFPDSVKLSPVQQSVLLALGLQYKTIDTLEKEFNLP 890 

Query: 888 SGQLMGLFNRIIRKWKLFNEVQEKAIEEQMVAAKDWME PTMKTLSDDLDE 939 

S QL+ + ++ +K++K +E++ K IEE++ + K P ++L ++L E 

Sbjct: 891 SNQLLAMLVKLSKKIMKCIDEIETKDIEEELGSNKKTESSNSKLPEFTPLQQSLEEELQE 950 

Query: 940 AAKEFQ-EKHKKEVGKLKSMDLSEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEA 998 

A E +K+ + ++DL +Y IRG++E+W KA N I R + 

SbjCt: 951 GADEAMLALREKQRELI NAI DLEK YAI RGNEEDW KAAEN-QIQKTNGKGARVVSI 1004 

Query: 999 KQE P KQS KK L — KNRETKN K KDMK L K RKK 1025 

K E +++ L +++TK K K K +K 
Sbjct: 1005 KGEKRKNNSLDASDKKTKEKPSSKKKFRK 1033 

Pedant information for DKFZphtes3_6cll, frame 3 

Report for DKFZphtes3_6cll . 3 



[LENGTH] 

[MW] 

[pi] 

(HOMOL] 

0.0 

(FUNCAT] 

[ FUNCAT ] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



1025 

115704.57 
8.50 

PIR:S55151 probable membrane protein YNLl32w - yeast (Saccharomyces cerevisiae) 

10.99 other signal-transduction activities [S. cerevisiae, YNL132w] 0.0 
r general function prediction (H. influenzae, HI1254] 2e-05 

ATP_GTP_A 1 
RGD 1 
Alpha_Beta 

LOW COMPLEXITY 11.80 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 



MHRKKVDNRIRILIENGVAERQRSLFVWGDRGKDQWILHHMLSKATVKARPSVLWCYK 

cccccccchhhhhhcccccccceeeeeeeeccccceeeeehhhhhhhhhhccceeehhhh 

KELGFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFGM 

hhhcccchhhhhhhhhhhhhhhhcccccccccceeeecccceeeeeccccceeeccccee 

CVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDW 

xxxxxxxxxxxxxxx 

eehhhhhccccchhhhhhhhhcccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhh 

GRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELKE 

hhhhhhhhhhhcccceeeeeecceeeecccccccccccccccccccccccchhhhhhhhh 

SLQDTQPVGVLVDCCKTLDQAKAVLKFI EGI SEKTLRSTVALTAARGRGKSAALGLAIAG 

xxxxxxxxx 

hhcccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhheeeccccccchhhhhhhhhh 

AVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVNV 
XXX 

hhhhcccceeecccccccchhhhhhhhhhhhhhhhhhhhhheeeeeccccccceeeeeeh 
FREHRQTIQYIHPADAVKLGQAELWIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGTG 
hhhhhhheeeeccccccccccceeeehhhhhccchhhhhhhccceeeeeeeccccccccc 
RSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQES I RYAPGDAVEKW 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

cchhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhceeeccccchhhh 
LNDLLCLDCLNITRIVSGCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASHYKN 

XXXXXXXXXXX 

hhhhhhcccccceeeccccccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhccc 
SPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRGKKA 
cccccccccccccceeeeeeccccccccccchhhhhhhhhhccccchhhhhhhhcccccc 
SGDLIPWTVSEQFQDPDFGGLSGGRWRIAVHPDYQGMGYGSRALQLLQMYYEGRFPCLE 
cccchhhhhhhhhhhccccccccceeeeeeccccccccccchhhhhhhhhhhhcccchhh 
EKVLETPQEIHTVSSEAVSLLEEVITPRKDLPPLLLKLNERPAERLDYLGVSYGLTPRLL 
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SEG xxxxxxxxxx 

PRD hhhhhccccccchhhhhhhhhhhhhhccccccccccccccccccceeeeccccccchhhh 

SEQ KFWKRAGFVPVYLBQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFRRRFLALLS 



PRD hhhhhcccceeeeeccccccccceeeeeeecccccccccchhhhhhhhhhhhhhhhhhhh 

SEQ YQFSTFSPSLALNIIQNRNMGKPAQPALSREELEALFLPYDLKRLEMYSRNMVDYHLIMD 

SEG 

PRD hhhhcchhhhhhhhhhhcccccccchhhhhhhhhhhhccchhhhhhhhhccchhhhhhhh 

SEQ MIPAISRIYFLNQLGDLALSAAQSALLLGIGLQHKSVDQLEKEIELPSGQLMGLFNRIIR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhhhhcchhhhhhhhhhhhhccccchhhhhhhhhh 

SEQ KWKLFNEVQEKAIEEQMVAAKDVVMEPTMKTLSDDLDEAAKEFQEKHKKEVGKLKSMDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ SEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEAKQEPKQSKKLKNRETKNKKDMK 

Seg xxxxxxxxxxxxxxx 

PRD cceeecccchhhhhhhhhccccceeeeeeccchhhhhhhhcccccccccccccccchhhh 

SEQ LKRKK 

SEG xxxxx 

PRD hhccc 



Prosite for DKFZphtes3_6cll . 3 

PS00016 966->969 RGD PDOC00016 

PS00017 284->292 ATP_GTP_A PDOC00017 



(No Pfam data available for DKFZphtes3_6cll . 3) 
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DKFZphtes3_6dl6 



group: testes derived 

DKFZphtes3 6dl6 encodes a novel 695 amino acid protein nearly identical to a sequence from 
human PAC clone WUGSC:H_DJ1185I07 .2 . 

The cDNA is different to the proposed gene model: it contains additional exons . 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



WUGSC:H_DJ1185I07.2, differences to genmodel 

differences to genmodel of WUGSC:H_DJ1 185107 .2 two exons skippt, 

Sequenced by BMFZ 

Locus: /map= w 7qll.23-q21" 

Insert length: 4572 bp 

Poly A stretch at pos. 4540, polyadenylation signal at pos. 4520 



1 GGCGGCGCTA GCTTCGGAGT CTCCCGCGCG CACCTCAGCC GCCTCCTAGC 
51 GGCGCGGCGC TCGCTCCTAC GCCTAAAATG ACCAATGTGT GATTTCAGTG 
101 GAATAAATGG CGTCCAAAGT CACAGATGCT ATAGTCTGGT ATCAAAAGAA 
151 GATTGGAGCA TATGATCAAC AAATATGGGA AAAATCTGTT GAACAGAGAG 
201 AAATCAAGGG GCTAAGGAAT AAACCAAAGA AAACAGCACA TGTGAAACCA 
251 GACCTCATAG ATGTTGATCT TGTAAGAGGG TCTGCATTTG CAAAGGCAAA 
301 GCCTGAAAGT CCTTGGACTT CTCTGACCAG AAAGGGAATT GTTCGAGTTG 
351 TATTTTTCCC CTTTTTCTTC CGGTGGTGGT TACAAGTAAC ATCAAAGGTC 
401 ATCTTTTTCT GGCTTCTTGT CCTTTATCTT CTTCAAGTTG CTGCAATAGT 
451 ATTATTCTGC TCCACTTCTA GCCCACACAG CATACCTCTG ACAGAGGTGA 
501 TTGGGCCGAT ATGGCTGATG CTGCTCCTGG GAACTGTGCA TTGCCAGATT 
551 GTTTCCACAA GAACACCCAA ACCTCCTCTA AGTACAGGGG GTAAAAGAAG 
601 AAGGAAATTA AGAAAAGCAG CCCATTTGGA AGTACATAGG GAAGGAGATG 
651 GTTCTAGTAC CACAGATAAC ACACAAGAGG GAGCAGTTCA GAACCACGGT 
701 ACAAGCACCT CTCACAGCGT TGQCACTGTC TTCAGAGATC TCTGGCATGC 
751 TGCTTTCTTT TTATCAGGAT CAAAGAAAGC AAAGAATTCA ATTGATAAAT 
801 CAACTGAAAC TGACAATGGC TATGTATCCC TTGATGGGAA GAAGACTGTT 
851 AAAAGCGGTG AAGATGGAAT ACAAAACCAT GAACCTCAGT GTGAAACTAT 
901 TCGACCAGAA GAGACAGCCT GGAACACAGG AACACTGAGG AATGGTCCTA 
951 GCAAAGATAC CCAAAGGACA ATAACAAATG TCTCTGATGA AGTCTCCAGT 
1001 GAGGAAGGTC CTGAAACAGG ATACTCATTA CGTCGTCATG TGGACAGGAC 
1051 TTCTGAAGGT GTTCTTCGGA ATAGAAAGTC ACACCATTAT AAGAAACATT 
1101 ACCCTAATGA GGACGCCCCT AAATCGGGTA CTAGTTGCAG CTCTCGCTGT 
1151 TCAAGTTCCA GACAGGATTC TGAGAGTGCA AGGCCAGAAT CTGAAACAGA 
1201 AGATGTGTTA TGGGAAGACT TGTTACATTG - TGCAGAATGC CATTCATCTT 
1251 GTACCAGTGA GACAGATGTG GAAAATCATC AGATTAATCC ATGTGTGAAA 
1301 AAAGAATATA GAGATGACCC TTTTCATCAG AGTCATTTGC CCTGGCTCCA 
1351 TAGTTCCCAC CCAGGATTAG AAAAAATAAG TGCTATAGTA TGGGAAGGTA 
1401 ATGATTGTAA GAAAGCAGAC ATGTCTGTAC TTGAAATCAG TGGAATGATA 
1451 ATGAACAGAG TGAACAGCCA TATACCAGGA ATAGGATACC AGATTTTTGG 
1501 AAATGCAGTC TCTCTCATAC TGGGTTTAAC TCCATTTGTT TTCCGACTTT 
1551 CTCAAGCTAC AGACTTGGAA CAACTCACAG CACATTCTGC TTCAGAACTT 
1601 TATGTGATTG CATTTGGTTC TAATGAAGAT GTCATAGTTC TTTCTATGGT 
1651 TATAATAAGT TTTGTGGTTC GCGTGTCTCT TGTGTGGATT TTCTTTTTTT 
1701 TGCTCTGTGT AGCAGAAAGA ACTTATAAAC AGCGATTACT TTTTGCAAAA 
1751 CTCTTTGGAC ATTTAACATC TGCAAGGAGG GCTCGAAAAT CTGAGGTTCC 
1801 TCATTTCCGG TTGAAGAAAG TACAGAATAT AAAAATGTGG CTATCTCTCC 
1851 GTTCCTATCT TAAGCGTCGA GGTCCTCAGC GATCAGTTGA TGTAATAGTT 
1901 TCATCTGCTT TCTTATTGAC TATCTCAGTT GTATTTATCT GTTGTGCCCA 
1951 GATAAACCTC TACTTGAAAA TGGAGAAAAA ACCTAACAAA AAGGAGGAAC 
2001 TGACACTAGT GAATAATGTT TTAAAACTGG CTACTAAACT GCTAAAGGAG 
2051 TTGGACAGTC CTTTTAGATT ATATGGGCTT ACAATGAATC CGCTGCTTTA 
2101 TAACATCACC CAGGTTGTTA TCCTGTCAGC TGTTTCTGGT GTTATCAGTG 
2151 ACTTGCTTGG ATTTAATTTA AAGCTATGGA AGATTAAGTC ATGACAATTC 
2201 AAAGAAAAGA AGATGTAGCC TCTTTTCCAG AATAAGAGTA CTGACTAAGC 
2251 TGCCTGAAAG CTTGTCACTG ATTCTTTGCT TCAGGAGTCT CAGCTAGGGA 
2301 GTTGAAGTGT TTACATCAGA CTGTCTTGTG CAATTCTTAT ATTTATTTTA 
2351 CTGGTTCACT TTTTTTTACA TTTATTTTAG TCTTTATATT TTTATTTTTA 
2401 AGCATTGATG TACTTAGTTG TTGAAAGGGT GATGAAACTG ATATCCAGAT 
2451 ACTTGAGATC CTGGTAATTG GTCATAAATA ATTGGCAAAA TAACAAATTG 
2501 TGAAAATAGA AGCCATTGCT CAGCACCGTT TCTCCATCAA TGCCGTGAAC 
2551 TTGCCTTACT TGAGGAAAAA TTCTTTAACT TTGGAATATT GCATTGAACT 
2601 CAGCTATACA CATAAAACAT TTTCTTTGGT AAATCAAGAT CCAGTCAGGG 
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2651 TTTCTCTTGA ATTATTTTGG AACAATGCCA GGATCCAAAC TGATTAAGTT 
2701 ACAGTTTAAG CACCCTTCAG TATTAATATA TACGGTATTA TATAACAGGT 
2751 CAACAAGTGC TCTTTGATGA TAAAACTTGT AATAGAGCAA TAATTGTAAA 
2801 TGGTTACCAT ACTGTAAGAT ATTTTGATAA AAATTAACTA GTAATACTTG 
2851 TATTTATTTG AAACACTGGG CTGTTTGCAC AGCTCCAACT GTGCATGCTC 
2901 AAAATGTGCA CTTTTTAAAA TTGTTACTTT TAATGCGTAT CTTTATATGG 
2951 GATCTGTTAT AGTATACTAG GGCATGATAT GGTATCCTTT TGAGTGAGGT 
3001 ATATACTCAT CTCACAAGTG AAGTGCCTAC TG AT ATT ACT AAAGTACATT 
3051 ATGTTTACTC AAGTAAATAA TTTTCTCCCC ATGGTACACT CTAGTGTAGG 
3101 CTATTCATAC CACACTGAAA TGAACAACTG AAGAATAAGG CTAAGAACCA 
3151 ATAAAATATT TCTCTAATTG CTAGTTGTAA AACTGTATCC AAATTTTCAG 
3201 AAAAGACAGC TTCAGCTTGC AAATTCTATC CTCTAAACTT ATCTGGTGCA 
3251 TTCTCCCCAC CCCACCCCCA TTATATAAGG GCTATTTTAG ATGCTTTTAA 
3301 CCTCCCCAAC AAATAATTTG CCAAGTGTCC AATGAGAACT TATCATGTTG 
3351 GTGTGTTAGG TAAATCGGGC AAATATGATA GTGTCTTACA TTGGGCCTTG 
3401 ATTTTAAGTT GTTATATTTG TACAATCGAG TATTTTAGAA ATTACATGAA 
34 51 ACATGAAACA GTTTTTGCAA TTTTTTTTAA ACTGGGCATC TGGTTTCTAA 
3501 AAATTTATTT GAAACAATCT AGAATTTTCT TGGTGCAAAG TGTATCATGT 
3551 GGAATATCCT CATATTTTTA CCATATTTTA AGAACTTTAA GACGATTAAT 
3601 TGTAAATAAT TTATTTGATT GGTGCAGTTC TAATCCCTAA ATCATAATCT 
3651 TAAAATCAGG AATGTGTGGA GAACAGAGCC ATGTCATATC ACTTTGCTCT 
3701 TACCATTCCT TTTGATCAGC CTCAATTCAG CCTCATTGTG TAGTATGTTT 
3751 TTTCTTTCTA TGAAAAACAA CAGAAAGCAT TTCATTTTAT TTGCCTATGT 
3801 TCAAATATGT TTAATAATGA CCAAAGTGCA TTCTGAGTTT TTTCAAGGAA 
3851 TGTAATACTG GAGCTTTAAG AACATACTTA GTTTCTCATG TGAAAACTTA 
3901 GGCTTTGTCT GATGTTTTTC CTTCCTCTAT TGTCTAATGT TGAGGTTGTT 
3951 TTTAGGAATT ATGTTTTATA AACTTTTTCA ATATAAGGTA CATGCCTATA 
4001 CAGAACTTAA CATTTTGCAC AGAATATATC AAATATATTT TGAGAAAAAA 
4051 AGTACGGCAT GAGTTCTGTT AGGAATAAAA GATGAAACTA TTGTATCTCA 
4101 CAAAAAATCT TATTTCAGAA TGGAAATATT TTTGAGAAAA GTAGCTGAGT 
4151 ATACTGGTTT AAGAAAATGC TTGTTTTAGA TTGAGGTTAA CTTAGAGTTG 
4201 GGAGTTGATT TATTAAGTAC AGTATACCTC TCAACAGTTT ATAAATAATA 
4251 TGTTGAATTA TGTCAGTGTG GGCAGCAGTA GAATACTAAA AGGAAAATGT 
4 301 CATGTTAAGC AATTTCAGAA CATTAACTGA ACTATTTTCA AAGCAGAAAA 
4 351 ATTGACATTG CTGCCTTTAA GAATACCATG AATGTAAGAA ATTGAAAGAA 
4401 ATTGTAAAAT ATCACATAAT ATAGAAATGG CAGTTCAAAG AGAATTGTGG 
44 51 CAGATGTTGT GTGTGAACTG TTGTTTCTTT GCCACATGTG TTGTATTTGA 
4501 AAGTTTTACA GTAAGTTTAA AATAAAACAT TCTGTGACTG AAAAAAAAAA 
4551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 2191 bp; peptide length: 695 

Category: known protein 

Classification: unclassified 

Prosite motifs: CYTOCHROME_C (375-381) 



1 MASKVTDAIV WYQKKIGAYD QQIWEKSVEQ REIKGLRNKP KKTAHVKPDL 
51 IDVDLVRGSA FAKAKPESPW TSLTRKGIVR VVFFPFFFRW WLQVTSKVIF 
101 FWLLVLYLLQ VAAIVLFCST SSPHSIPLTE VIGPIWLMLL LGTVHCQIVS 
151 TRTPKPPLST GGKRRRKLRK AAHLEVHREG DGSSTTDNTQ EGAVQNHGTS 
201 TSHSVGTVFR DLWHAAFFLS GSKKAKNSID KSTETDNGYV SLDGKKTVKS 
251 GEDGIQNHEP QCETIRPEET AWNTGTLRNG PSKDTQRTIT NVSDEVSSEE 
301 GPETGYSLRR HVDRTSEGVL RNRKSHHYKK HYPNEDAPKS GTSCSSRCSS 
351 SRQDSESARP ESETEDVLWE DLLHCAECHS SCTSETDVEN HQINPCVKKE 
401 YRDDPFHQSH LPWLHSSHPG LEKISAIVWE GNDCKKADMS VLEISGMIMN 
4 51 RVNSHIPGIG YQI FGNAVSL ILGLTPFVFR LSQATDLEQL TAHSASELYV 
501 IAFGSNEDVI VLSMVIISFV VRVSLVWIFF FLLCVAERTY KQRLLFAKLF 
551 GHLTSARRAR KSEVPHFRLK KVQNIKMWLS LRSYLKRRGP QRSVDVIVSS 
601 AFLLTISWF ICCAQIKLYL KMEKKPNKKE ELTLVNNVLK LATKLLKELD 
651 SPFRLYGLTM NPLLYNITQV VILSAVSGVI SDLLGFNLKL WKIKS 

BLAST P hits 



919 



WO 01/12659 



PCT/IB00/01496 



No BLAST P hits available 

Alert BLAST P hits foe DKFZphtes3_6dl6, frame 2 

PIR:S38170 SRP40 protein - yeast (Saccharomyces cerevisiae), N = 1, 
Score « 100, P = 0.08 

TREMBL:AC004 990_1 gene: "WUGSC:H DJ1185I07 . 2"; Homo sapiens PAC clone 

DJ1185I07 from 7qll.23-q21, complete sequence., N « 2, Score » 2693, P 

o 0 

>TREMBL:AC004990_1 gene: M WUGSC :H_DJ1185I07 . 2"; Homo sapiens PAC clone 
DJ1185I07 from 7qll.23-q21, complete sequence. 
Length - 588 

HSPs: 

Score = 2693 (404.1 bits), Expect = 0.0e+00, Sum P(2) = 0.0e+00 
Identities - 510/515 (99%), Positives = 512/515 (99%) 

Query: 35 GLRNKPKKTAHVKPDLI DVDLVRGSAFAKAKPES PWTSLTRKGI VRVVFFPFFFRWWLQV 94 

GLRNKPKKTAHVKPDLI DVDLVRGSAFAKAKPES PWTSLTRKGI VRVVFFPFFFRWWLQV 
SbjCt: 1 GLRNKPKKTAHVKPDLI DVDLVRGSAFAKAKPES PWTSLTRKGI VRVVFFPFFFRWWLQV 60 

Query: 95 TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTP 154 

TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTP 
SbjCt: 61 TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTP 120 

Query: 155 KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 214 

KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHCTSTSHSVGTVFRDLWH 
SbjCt: 121 KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 180 

Query: 215 AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 274 

AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 
SbjCt: 181 AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 240 

Query: 275 GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 334 

GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 
SbjCt: 241 GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 300 

Query: 335 EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 394 

EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 
SbjCt: 301 EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 360 

Query: 395 PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMIMNRVNS 454 

PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMIMNRVNS 
SbjCt: 361 PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMIMNRVNS 420 

Query: 455 HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 514 

HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 
SbjCt: 421 HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 480 

Query: 515 VIISFVVRVSLVWIFFFLLCVAERTYKQRLLFAKL 549 

VIISFWRVSLVWIFFFLLCVAERTYKQ L+ K+ 
SbjCt: 481 V 1 1 S FW RVS L VW IFF FLLC V AERT YKQINLYLKM 515 

Score - 409 (61.4 bits), Expect = 0.0e+00, Sum P(2) = 0.0e+00 
Identities « 92/115 (80%), Positives - 98/115 (85%) 

Query 595 DVIVSS AFLLTISWFI CCA QINLYLKMEKK PNKK E ELT L VN N VLK 640 

DVIV S +F++ +S+V+I C A QINLYLKMEKKPNKKEELTLVNNVLK 

SbjCt: 474 DVIVLSMVIISFVVRVSLVWIFFFLLCVAERTYKQINLYLKMEKKPNKKEELTLVNNVLK 533 

Query: 641 LATKLLKELDSPFRLYGLTMNPLLYNITQWILSAVSGVISDLLGFNLKLWKIKS 695 

LATKLLKELDSPFRLYGLTMNPLLYNITQWILSAVSGVISDLLGFNLKLWKIKS 
SbjCt: 534 LATKLLKELDS PFRL YGLTMNPLL YN ITQVV I LS AVSGVI S DLLGFNLKLWKI KS 588 

Pedant information for DKFZphtes3_6dl6, frame 2 



Report for DKFZphtes3_6dl6 . 2 



(LENGTH] 695 

(MW] 78466.68 

(pll 9.30 

[HOMOL] TREMBL:AC004990_1 gene: "WUGSC :H_DJ1185I07 .2"; Homo sapiens PAC clone DJ1185I07 

from 7qll.23-q21, complete sequence. 0.0 
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(PROSITE) CYTOCHROME_C 1 

£KW] TRANSMEMBRANE 6 

[KW] LOW_COMPLEXITY 5.32 % 

SEQ MASKVTDAIVWYQKKIGAYDQQIWEKSVEQREIKGLRNKPKKTAHVKPDLIDVDLVRGSA 

SEG 

PRO ccceeeeehhhhhhhcccchhhhhhhhhhhhhhhcccccccccccccccceeeeeeccch 

MEM 

SEQ FAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQVTSKVIFFWLLVLYLLQVAAIVLFCST 

SEG xxxxxxxxxxx 

PRD hhhhcccccccccccccceeeeecchhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTPKPPLSTGGKRRRKLRKAAHLEVHREG 

SEG xxxxxxxx 

PRD ccccccceeeeehhhhhhhhhhhhheeeeeeccccccccccchhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ DGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWHAAFFLSGSKKAKNSIDKSTETDNGYV 

SEG 

PRD cccccccccceeeeeeccccccccchhhhhhhhhhhhhhcccchhhhhcccccccccccc 

MEM 

SEQ SLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNTGTLRNGPSKDTQRTITNVSDEVSSEE 

SEG 

PRD cccccceeecccccccccccccccccccceeeeccccccccccccceeeecccccccccc 

MEM 

SEQ GPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPNEDAPKSGTSCSSRCSSSRQDSESARP 

SEG xxxxxxxxxxxxxxxxxx . . . 

PRD ccccceeeeeeccccccchhhhhhcccccccccccccccccccccccccccccccccccc 

MEM 

SEQ ESETEDVLWEDLLHCAECHSSCTSETDVENHQINPCVKKEYRDDPFHQSHLPWLHSSHPG 

SEG 

PRD cccchhhhhhhhhhhhcccccccccccccccccccceeeeeccccccccccccccccccc 

MEM 

SEQ LEKISAIVWEGNDCKKADMSVLEISGMIMNRVNSHIPGIGYQIFGNAVSLILGLTPFVFR 

SEG 

PRD cccceeeeeecccccccceeeeehhhhhhhhhccccccccccccccccceeecccccchh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LSQATDLEQLTAHSASELYVIAFGSNEDVIVLSMVI ISFWRVSLVWIFFFLLCVAERTY 

SEG 

PRD hhhhhhhhhhhhcccceeeeeeeccccceeeehhhhhhhhcchhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ KQRLLFAKLFGHLTSARRARKSEVPHFRLKKVQNIKMWLSLRSYLKRRGPQRSVDVIVSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccceeeeeehhhhhhhhhhhhccccceeeeeeee 

MEM MMMMMMM 

SEQ AFLLTISWFICCAQINLYLKMEKKPNKKEELTLVNNVLKLATKLLKELDSPFRLYGLTM 

SEG 

PRD eeeeeeeeeeeeeehhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ NPLLYNITQWILSAVSGVISDLLGFNLKLWKIKS 

SEG 

PRD cchhhhheeeeeeeeecchhhhhccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKF2phtes3_6dl6.2 
PS00190 375->381 CYTOCHROME_C PDOC00169 

(No Pfam data available for DKFZphtes3_6dl6 . 2 ) 
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DKFZphtes3_72kll 



group: testes derived . ^ 

DKFZphtes3J72kll encodes a novel 233 amino acid protein with similarity to S.pombe 
hypothetical repeat-containing protein. 

The novel protein contains 5 leucine zippers and a microbodies C-terminal targeting signal 
K-L) signature. This sequence is responsible for transport of proteins from free polysomes 
into the microbodies. 

No informative BLAST results; No predictive prosite, pfaro or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to S.pombe hypothetical repeat-containing protein 

complete cDNA, complete cds, 6 EST hits (3 from testis derived 
librarys) 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1134 bp 

Poly A stretch at pos. 1124, polyadenylation signal at pos. 1088 

1 AACCTTTCAA GTGCCCCCTC CTTTCCTTAA AGTCTTTTAT AGGGGTCCCC 
51 TTCTTGGCCA TCTCCATCCT GTGAGTCAGG ACTGAAAGGG CACAGACAGG 

101 TCACTGCCAG CATTGTTGGG GCAAGCCTGC AAGCACGCAT CACTGGGGAT 

151 CTGACATGAC AATGGCCGCC TGCCCCCTCT GAGGGCTACA GGACTTACCC 

201 CAGTGGGAAG CAGCTAAGCA GGTCTGACCA GCCGACCTGG ACCTGGCCAA 

251 GGGTCCTGTC ATCCCTCATG GCCACCCCGC CATTCCGGCT GATAAGGAAG 

301 ATGTTTTCCT TCAAGGTGAG CAGATGGATG GGGCTTGCCT GCTTCCGGTC 

351 CCTGGCGGCA TCCTCTCCCA GTATTCGCCA GAAGAAACTA ATGCACAAGC 

401 TGCAGGAGGA AAAGGCTTTT CGCGAAGAGA TGAAAATTTT TCGTGAAAAA 

451 ATAGAGGACT TCAGGGAAGA GATGTGGACT TTCCGAGGCA AGATCCATGC 

501 TTTCCGGGGC CAGATCCTGG GTTTTTGGGA AGAGGAGAGA CCTTTCTGGG 

551 AAGAGGAGAA AACCTTCTGG AAAGAGGAAA AATCCTTCTG GGAAATGGAA 

601 AAGTCTTTCA GGGAGGAAGA GAAAACTTTC TGGAAAAAGT ACCGCACTTT 

651 CTGGAAGGAG GATAAGGCCT TCTGGAAAGA GGACAATGCC TTATGGGAAA 

701 GAGACCGGAA CCTTCTTCAG GAGGACAAGG CCCTGTGGGA GGAAGAAAAG 

751 GCCCTGTGGG TAGAGGAAAG AGCCCTCCTT GAGGGGGAGA AAGCCCTGTG 

801 GGAAGATAAA ACGTCCCTCT GGGAGGAAGA GAATGCCCTC TGGGAGGAAG 

851 AGAGGGCCTT CTGGATGGAG AACAATGGCC ACGTTGCCGG AGAGCAGATG 

901 CTCGAAGATG GGCCCCACAA CGCCAACAGA GGGCAGCGCT TGCTGGCCTT 

951 CTCCCGAGGC AGGGCGTAGC CAGCATGCAG GTGCAGGGCC CTGTGGTCCA 
1001 GACTCCCCTG GGTTGGGATT CAAGTCCAGG GTGAGCCCAT GTGCTGGAGA 
1051 AAATACACAC TCATTGGTCT CCTTGCTTTG AAAGATCCAA TAAAGTCCTG 
1101 AGGCAAGGTT TGGAAAACCA ACTTAAAAAA AAAA 

BLAST Results 



No BLAST result 

Medline entries 

No Medline entry 

peptide information for frame 1 



ORF from 268 bp to 966 bp; peptide length: 233 
Category: similarity to known protein 
Prosite motifs: MICROBODIES_CTER (231-234) 
LEUCINE_ZIPPER (142-164) 
LEUCINE ZIPPER (149-171) 
LEUCINE_ZIPPER (156-178) 
LEOCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (170-192) 
LEUCINE ZIPPER (170-192) 
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1 MATPPFRLIR KMFSFKVSRW MGLACFRSLA ASSPSIRQKK LMHKLQEEKA 
51 FREEMKI FRE KIEDFREEMW TFRGKIHAFR GQILGFWEEE RPFWEEEKTF 
101 W KEEKS FW EM EKSFREEEKT FWKKYRTFWK EDKAFWKEDN ALWERDRNLL 
151 QEDKALWEEE KALWVEERAL LEGEKALWED KTSLWEEENA LWEEERAFWM 
201 ENNGHVAGEQ MLEDGPHNAN RGQRLLAFSR GRA 



BLASTP hits 



Entry SPCC330 4 from database TREMBLNEW: ( 
gene: "SPCC330. 04c n ; product: "hypothetical repeat-containing protein 
S.porabe chromosome III cosmid c330. 

Score - 149, P - 1.6e-08, identities - 55/187, positives - 88/187 

Entry A45973 from database PIR: 
trichohyalin - human 

Score - 147, P - 3.0e-07, identities = 57/194, positives «= 94/194 



Alert BLASTP hits for DKFZphtes3_72kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_72kll , frame 1 



Report for DKF2phtes3_72kll . 1 



(LENGTH] 

IMW] 

IpU 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITEJ 

( PROSITE] 

[KW] 

[KW] 



233 

28752.65 
5.70 

LEUCINE_2IPPER 5 

MICROBODIES_CTER 

MYRISTYL 1 

CK2_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

All_Alpha 

LOW COMPLEXITY 



3 
4 

15.45 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MATPPFRLIRKMFSFKVSRWMGLACFRSLAASSPSIRQKKLMHKLQEEKAFREEMKIFRE 

cccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhh 

KIEDFREEMWTFRGKIHAFRGQILGFWEEERPFWEEEKTFWKEEKSFWEMEKSFREEEKT 

xxxxxxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 

FWKKYRTFWKEDKAFWKEDNALWERDRNLLQEDKALWEEEKALWVEERALLEGEKALWED 

nhhhcccccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

KTSLWEEENALWEEERAFWMENNGHVAGEQMLEDGPHNANRGQRLLAFSRGRA 

. . .xxxxxxxxxxxx 

ccchhhhhhhhhhhhhhhhhhccccchhhhhhcccccccccchhhhhhhhccc 



Prosite for DKFZphtes3_72kll . 1 



PS00005 


14->17 


PS00005 


35->38 


PS00005 


71->74 


PS00005 


113->116 


PS00006 


106->110 


PS00006 


113->117 


PS00006 


183->187 


PSOOOOB 


81->87 


PS00342 


231->234 


PS00029 


142->164 


PS00029 


149->171 


PS00029 


156->178 


PS00029 


163->185 


PS00029 


170->192 



PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_STTE 

PKC_PHOSPHO_SITE • 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2~PHOSPHO_SITE 

myristyl 

microbodies^cter 

leucine_zipper 

leucine~zipper 

leucine'zipper 

leucine~zipper 

leucine zipper 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00299 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 



(No Pfam data available for DKFZphtes3_72kll . 1 ) 
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DKFZphtes3_72kl5 



group: cell structure and motility 

DKF2phtes3_72kl5 encodes a novel 188 amino acid protein with strong similarity to Rattus 
norvegicus~actin-f ilament binding protein Frabin. 

FGDl-related F-actin-binding protein (Farbin/FGDl) is a novel F-actin-binding protein. The 
gene locus fgdl seems to be responsible for faciogenital dysplasia or Aarskog-Scott syndrome. 
Frabin binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in 
Swiss 3T3 cells and COS7 cells induces cell shape change and c-Jun N-terminal kinase 
activation, as described for FGDl. Because FGDl has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 
and the actin cytoskeleton . Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events and 
induces the JNK/SAPK protein kinase cascade, which leads to the activation of transcription 
factors within the nucleus. 

The novel protein seems to be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as well as 
modulation of the JNK/SAPK pathway. 



strong similarity to actin- filament binding protein Frabin 

2 EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1845 bp 

Poly A stretch at pos. 1835, polyadenylation signal at pos. 1816 



1 GTGATGGAGA GTGCTGTTAT GATAGATGAA TCTAGGAAAG CCTCTTTGGA 
51 GATGTGATAC CTGAACAGAA CCCCGAATGA TAAGAAGAAA TACCAGTGTT 
101 TTAGGAGAGA TTGTCCTAAG CAGAGAACAG CAGCTGCAAA GACCCCAAGA 
151 CACATACACT TGGTTATTAA GAATGGGAGC AGCAAGGAGT ATGGCAAGAA 
201 CACAGTGAGT TTTCCCTTGA GTGTGTGAGG AAGCCCTCAG AGTTTGTGAC 
251 TGACTTGTAG AGGTTCTAGT GGAGGGGATC AGAGTGGAAA CAAAGAGACC 
301 AGTTAAAAAG GTATGGCAGC ATGAATAAAA AAGTTTTGAG AGTATTCATT 
351 ATGCCTTCCA AATAAAAAAC TCTTTGGTTC ATAATTTGTT CATAAATTAA 
401 GGACTGGCTA CACTGTACTA TTTAAAAATG TTAAGAAACA TCAATAAGTA 
451 AAAATGTTAG GAAGAGATGA TAAATACGTA AG TAT TAT AT CTAACTAAGT 
501 CTTTACTAAC TAGTCACATT ATTAAACAGT GCAAGGATCA AGAAAAGTTA 
551 AGCGTTGAAA AATAAATAAA TAAGTTATAA- ATAAAATAAA CAGCCCAAGG 
601 AAATGTTCCA GTCCCCATAG GTAGACTCGG GGTCATCTTC TTTATTTAAA 
651- TCTTTATTTA AATGTGGATA GCATCCCAAG AGACTTGGGT CTACACTAAG 
701 AATATTCAAA TCCATGTTTC TGAAACCATC AGAGATAGAA AAAAAAAGTA 
751 GCGAATATCC CTTTTCAACT GGAATAAACT TGTCTTAATT CTAGAACTTT 
801 TCCATACCAA TGTTTTCATG CTTCCTTTGT ATTTTATCTT TTAGCTCATT 
851 ATCAAATTAT AGTGATTTGA AGAAAGAGTC TGCTGTGAAC CTAAATGCTC 
901 CTAGAACCCC AGGAAGGCAT GGATTGACAA CCACACCTCA ACAAAAACTC 
951 CTCTCCCAGC ACTTGCCACA GAGGCAGGGA AATGATACAG ATAAGACTCA 
1001 GGGTGCACAG ACTTGTGTGG CCAACGGTGT AATGGCAGCA CAAAACCAGA 
1051 TGGAATGTGA GGAGGAGAAA GCTGCCACTC TTAGCTCAGA TACTTCTATT 
1101 CAAGCTTCTG AACCCTTGCT TGATACGCAC ATAGTGAATG GAGAAAGAGA 
1151 TGAAACTGCC ACAGCTCCTG CATCACCCAC AACAGATAGC TGTGATGGAA 
1201 ATGCTTCTGA CAGTAGCTAC AGGACTCCAG GCATAGGCCC AGTGCTCCCC 
1251 CTAGAAGAAA GAGGGGCAGA AACAGAAACC AAGGTACAAG AGAGGGAAAA 
1301 TGGGGAAAGC CCTCTGGAAC TGGAGCAGCT GGACCAGCAC CATGAGATGA 
1351 AGGTAGAGCA TGAGACTAGC TCATGAGCAG GGAAAACCCT GCCTATTCGA 
1401 TTGTTGTCTT AAAACTCTTT ATTTATTGCA CCCCTGAAAT GTATGAATCA 
1451 GATCACCCAC ACTGGCAGTT AAACGATTTT CAAGCTCTGG CTGCTGATTA 
1501 GCATTTCCCC TATGCTCTAA GCAGATATTT CACTTTTTCT TTTCATGTAG 
1551 TTTCTGTTAA TATCTCTGTT GTAATTTCAG GAGTCAGAAC AGTGTGGAAA 
1601 CTTTAATATA GGAAATCCAC AAATGTATTG TTTTTACATA GAAAGAAAAT 
1651 GTTCCTTGTT GCTCTAGATG TTGGTGCTGT ATCCCTAATA CTTACGGGCC 
1701 AAGCAAGAAG AAATTGTATA ATCTTTGTTG TTCAGAAGTT TCTAATAGAA 
1751 TAAATAGGCC TGTAAGATGA ACTTGCCACT AGTAAATGTT ACTTTTAAGG 
1801 ACATGAATAT GGAAGTATTA AATTATTCAA CAGATAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98334590: 

Frabin, a novel FGDl-related actin filament-binding protein capable of 
changing cell shape 

and activating c-Jun N-terminal kinase. 



Peptide information for frame 3 



ORF from 810 bp to 1373 bp; peptide length: 188 
Category: similarity to known protein 
Classification: Cell structure/motility 

1 MFSCFLCILS FSSLSNYSDL KKESAVNLNA PRT PGRHGLT TTPQQKLLSQ 
51 HLPQRQGNDT DKTQGAQTCV ANGVMAAQNQ MECEEEKAAT LSSDTSIQAS 
101 EPLLDTHIVN GERDETATAP ASPTTDSCDG NASDSSYRTP GIGPVLPLEE 
151 RGAETETKVQ ERENGESPLE LEQLDQHHEM KVEHETSS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72kl5, frame 3 

TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; 
Rattus norvegicus actin-f ilament binding protein Frabin mRNA, complete 
cds., H - 1, Score - 428, P = 1.8e-39 



>TREMBL: AF038388_1 product: "act in- filament binding protein Frabin"; Rattus 
norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 
Length - 766 

HSPs: 

Score - 428 (64.2 bits), Expect = 1.8e-39, P » 1.8e-39 
Identities - 90/174 (51%), Positives = 115/174 (66%) 

Query: 12 SSLSNySDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDTDKTQGAQTCVA 71 

S LS+Y+D++K+S +NLN P+TP +HGLT+T QKL S PQ+Q D+D+ QG C+A 
Sbjct: 31 SVLSSYTDVQKDSTMNLNIPQTPRQHGLTSTTPQKLPSHKSPQKQEKDSDQNQGQHGCLA 90 

Query: 72 NGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAPASPTTDSCDGN 131 

NGV AAQ+QMECE EK A LS +T Q + 0 H++NG R+ET T AS T+S D N 
Sbjct: 91 NGVAAAQSQMECETEKEAALSPETDTQTAAASPDAHVLNGVRNETTTDSASSVTNSHDEN 150 

Query: 132 ASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEMKVEHE 185 

A DSS RT G LP +E E ++QERENG S L LDQHHE+K +E 

Sbjct: 151 ACDSSCRTQGTDLGLPSKEGEPVIEAELQERENGLSTEGLNPLDQHHEVKETNE 204 



Pedant information for DKFZphtes3_72kl5, frame 3 



Report for DKFZphtes3_72kl5 . 3 



[ LENGTH ) 188 

(MW] 20388.32 

[pi] 4.62 

[HOMOL] TREMBL:AF038388_1 product: 



actin-filament binding protein Frabin"; Rattus 



norvegicus actin-filament binding protein Frabin mRNA, complete cds. 2e-38 



(KW) 
[KW] 
(KWJ 



AlljUpha 
SIGNAL_PEPTIDE 16 
LOW COMPLEXITY 



12.77 % 



SEQ MFSCFLCILSFSSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDT 

SEG . xxxxxxxxxxxxxx 

PRD ccchhhhhcccccccccccccccccccccccccccccccccccchhhhhhhccccccccc 

SEQ DKTQGAQTCVANGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAP 
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SEG xxxxx 

PRO ccccccceeecchhhhhhhhhhhhhhhhhhhccccceeecccccceeeeecccccccccc 

SEQ ASPTTDSCDGNASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEM 

SEG xxxxx 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhcccccchhhhhhhhhhhh 

SEQ KVEHETSS 

SEG 

PRD hhhhhccc 

(No Prosite data available for DKFZphtes3J72kl5. 3) 
(No Pfam data available for DKFZphtes3_72klS.3) 
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DKFZphtes3_72pl6 



group: intracellular transport and trafficing 

DKFZphtes3 72pl6 encodes a novel 796 amino acid protein with very strong similarity to Mus 
musculus maternal-embryonic 3 (Mem3) gene. 

Mem3 was isolated from a partial subtraction library of mouse unfertilized eggs and 
preimplantation embryos. Its transcript is abundant in the unfertilized egg and also actively 
transcribed from the newly formed zygotic genome. As Mera3, the novel protein is similar to 
yeast VPS (vacuolar protein sorting) 35. The null allele of VPS35 results in yeast in a 
differential defect in the sorting of vacuolar carboxypeptidase Y (CPY) , proteinase A (PrA), 
proteinase B (PrB), and alkaline phosphatase (ALP). 

The new protein can find application in modulation the sorting of proteins into different 
compartments . 



strong similarity to mouse MEM3 and yeast VPS35 
Sequenced by DKFZ 
Locus : /map** " 1 6pl 3 . 3 " 
Insert length: 2707 bp 

Poly A stretch at pos . 2697, no polyadenylation signal found 



1 CTACGCGCGG GGCGGGTGCT GCTTGCTGCA GGCTCTGGGG AGTCGCCATG 
51 CCTACAACAC AGCAGTCCCC TCAGGATGAG CAGGAAAAGC TCTTGGATGA 
101 AGCCATACAG GCTGTGAAGG TCCAGTCATT CCAAATGAAG AGATGCCTGG 
151 ACAAAAACAA GCTTATGGAT TCTCTAAAAC ATGCTTCTAA TATGCTTGGT 
201 GAACTCCGGA CTTCTATGTT ATCACCAAAG AGTTACTATG AACTTTATAT 
251 GGCCATTTCT GATGAACTGC ACTACTTGGA GGTCTACCTG ACAGATGAGT 
301 TTGCTAAAGG AAGGAAAGTG GCAGATCTCT ACGAACTTGT ACAGTATGCT 
351 GGAAACATTA TCCCAAGGCT TTACCTTTTG ATCACAGTTG GAGTTGTATA 
401 TGTCAAGTCA TTTCCTCAGT CCAGGAAGGA TATTTTGAAA GATTTGGTAG 
451 AAATGTGCCG TGGTGTGCAA CATCCCTTGA GGGGTCTGTT TCTTCGAAAT 
501 TACCTTCTTC AGTGTACCAG AAATATCTTA CCTGATGAAG GAGAGCCAAC 
551 AGATGAAGAA ACAACTGGTG ACATCAGTGA TTCCATGGAT TTTGTACTGC 
601 TCAACTTTGC AGAAATGAAC AAGCTCTGGG TGCGAATGCA GCATCAGGGA 
651 CATAGCCGAG ATAGAGAAAA AAGAGAACGA GAAAGACAAG AACTGAGAAT 
701 TTTAGTGGGA ACAAATTTGG TGCGCCTCAG TCAGTTGGAA GGTGTAAATG 
751 TGGAACGTTA CAAACAGATT GTTTTGACTG GCATATTGGA GCAAGTTGTA 
801 AACTGTAGGG ATGCTTTGGC TCAAGAATAT CTCATGGAGT GTATTATTCA 
851 GGTTTTCCCT GATGAATTTC ACCTCCAGAC TTTGAATCCT TTTCTTCGGG 
901 CCTGTGCTGA GTTACACCAG AATGTAAATG TGAAGAACAT AATCATTGCT 
951 TTAATTGATA GATTAGCTTT ATTTGCTCAC CGTGAAGATG GACCTGGAAT 
1001 CCCAGCGGAT ATTAAACTTT TTGATATATT TTCACAGCAG GTGGCTACAG 
1051 TGATACAGTC TAGACAAGAC ATGCCTTCAG AGGATGTTGT ATCTTTACAA 
1101 GTCTCTCTGA TTAATCTTGC CATGAAATGT TACCCTGATC GTGTGGACTA 
1151 TGTTGATAAA GTTCTAGAAA CAACAGTGGA GATATTCAAT AAGCTCAACC 
1201 TTGAACATAT TGCTACCAGT AGTGCAGTTT CAAAGGAACT CACCAGACTT 
1251 TTGAAAATAC CAGTTGACAC TTACAACAAT ATTTTAACAG TCTTGAAATT 
1301 AAAACATTTT CACCCACTCT TTGAGTACTT TGACTACGAG TCCAGAAAGA 
1351 GCATGAGTTG TTATGTGCTT AGTAATGTTC TGGATTATAA CACAGAAATT 
1401 GTCTCTCAAG ACCAGGTGGA TTCCATAATG AATTTGGTAT CCACGTTGAT 
14 51 TCAAGATCAG CCAGATCAAC CTGTAGAAGA CCCTGATCCA GAAGATTTTG 
1501 CTGATGAGCA GAGCCTTGTG GGCCGCTTCA TTCATCTGCT GCGCTCTGAG 
1551 GACCCTGACC AGCAGTACTT GATTTTGAAC ACAGCACGAA AACATTTTGG 
1601 AGCTGGTGGA AATCAGCGGA TTCGCTTCAC ACTGCCACCT TTGGTATTTG 
1651 CAGCTTACCA GCTGGCTTTT CGATATAAAG AGAATTCTAA AGTGGATGAC 
1701 AAATGGGAAA AGAAATGCCA GAAGATTTTT TCATTTGCCC ACCAGACTAT 
1751 CAGTGCTTTG ATCAAAGCAG AGCTGGCAGA ATTGCCCTTA AGACTTTTTC 
1801 TTCAAGGAGC ACTAGCTGCT GGGGAAATTG GTTTTGAAAA TCATGAGACA 
1851 GTCGCATATG AATTCATGTC CCAGGCATTT TCTCTGTATG AAGATGAAAT 
1901 CAGCGATTCC - AAAGCACAGC TAGCTGCCAT CACCTTGATC ATTGGCACTT 
1951 TTGAAAGGAT GAAGTGCTTC AGTGAAGAGA ATCATGAACC TCTGAGGACT 
2001 CAGTGTGCCC TTGCTGCATC CAAACTTCTA AAGAAACCTG ATCAGGGCCG 
2051 AGCTGTGAGC ACCTGTGCAC ATCTCTTCTG GTCTGGCAGA AACACGGACA 
2101 AAAATGGGGA GGAGCTTCAC GGAGGCAAGA GGGTAATGGA GTGCCTAAAA 
2151 AAAGCTCTAA AAATAGCAAA TCAGTGCATG GACCCCTCTC TACAAGTGCA 
2201 GCTTTTTATA GAAATTCTGA ACAGATATAT CTATTTTTAT GAAAAGGAAA 
2251 ATGATGCGGT AACAATTCAG GTTTTAAACC AGCTTATCCA AAAGATTCGA 
2301 GAAGACCTCC CGAATCTTGA ATCCAGTGAA GAAACAGAGC AGATTAACAA 
2351 ACATTTTCAT AACACACTGG AGCATTTGCG CTTGCGGCGG GAATCACCAG 
2401 AATCCGAGGG GCCAATTTAT GAAGGTCTCA TCCTTTAAAA AGGAAATAGC 
24 51 TCACCATACT CCTTTCCATG TACATCCAGT GAGGGTTTTA TTACGCTAGG 
2501 TTTCCCTTCC ATAGATTGTG CCTTTCAGAA ATGCTGAGGT AGGTTTCCCA 
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2551 TTTCTTACCT GTGATGTGTT TTACCCAGCA CCTCCGGACA CTCACCTTCA 
2 601 GGACCTTAAT AAAATTATTC ACTTGGTAAG TGTTCAAGTC TTTCTGATCA 
2651 CCCCAAGTAG CATGACTGAT CTGCAATTTA AAATTCCTGT GATCTGTAAA 
2701 AAAAAAA 



BLAST Results 



Entry AC007225 from database EMBLNEW: 

Homo sapiens chromosome 16 clone 4 80G7, WORKING DRAFT SEQUENCE, 38 
unordered pieces. 

Score = 1081, ? - 2.3e-217, identities « 219/221 
13 exons 

Entry HS015146 from database EMBL: » 
human STS WI-8848. 
Score « 2033, P ■» 2.9e-87, identities = 425/436 



Medline entries 



96327632: 

Genetic mapping and embryonic expression of a novel, maternally 
transcribed gene Mem3. 

97258867: 

Endosome to Golgi retrieval of the vacuolar protein sorting receptor, 
VpslOp, requires the function of the 
VPS29, VPS30, and VPS35 gene products. 

92360909: 

Alternative pathways for the sorting of soluble vacuolar proteins in 
yeast: a vps35 null mutant missorts and 
secretes only a subset of vacuolar hydrolases. 

10198044: 

Distinct Domains within Vps35p Mediate the Retrieval of Two Different 
Cargo Proteins from the Yeast 

Prevacuolar/Endosomal Compartment 



Peptide information for frame 3 



ORF from 4 8 bp to 2435 bp; peptide length: 796 
Category: strong similarity to known protein 
Classification: unset 



1 MPTTQQSPQD EQEKLLDEAI QAVKVQSFQM KRCLDKNKLM DSLKHASNML 

51 GELRTSMLSP KSYYELYMAI SDELHYLEVY LTDEFAKGRK VADLYELVQY 

101 AGNIIPRLYL LITVGVVYVK SFPQSRKDIL KDLVEMCRGV QHPLRGLFLR 

151 NYLLQCTRNI LPDEGEPTDE ETTGDISDSM DFVLLNFAEM NKLWVRMQHQ 

201 GHSRDREKRE RERQELRILV GTNLVRLSQL EGVNVERYKQ IVLTGILEQV 

251 VNCRDALAQE YLMECIIQVF PDEFHLQTLN PFLRACAELH QNVNVKNIII 

301 ALIDRLALFA HREDGPGIPA DIKLFDIFSQ QVATVIQSRQ DMPSEDWSL 

351 QVSLINLAMK CYPDRVDYVD KVLETTVEIF NKLNLEHIAT SSAVSKELTR 

401 LLKIPVDTYN NILTVLKLKH FHPLFEYFDY ESRKSMSCYV LSNVLDYNTE 

451 IVSQDQVDSI MNLVSTLIQD QPDQPVEDPD PEDFADEQSL VGRFIHLLRS 

501 EDPDQQYLIL NTARKHFGAG GNQRIRFTLP PLVFAAYQLA FRYKENSKVD 

551 DKWEKKCQKI FSFAHQTISA LIKAELAELP LRLFLQGALA AGEIGFENHE 

601 TVAYEFMSQA FSLYEDEISD SKAQLAAITL IIGTFERMKC FSEENHEPLR 

651 TQCALAASKL LKKPDQGRAV STCAHLFWSG RNTDKNGEEL HGGKRVMECL 

701 KKALKIAKQC MDPSLQVQLF IEILNRYIYF YEKENDAVTI QVLNQLIQKI 

751 REDLPNLESS EETEQINKHF HNTLEHLRLR RESPESEGPI YEGLIL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72pl6, frame 3 

TREMBL:AF024504_3 gene: "A_TM017A05 . 7"; Arabidopsis thaliana BAC 
TM017A05., N - 2, Score - 927, P - 1.9e-162 
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PIR:S56936 vacuolar protein-sorting protein VPS35 - yeast 
(Saccharomyces cerevisiae), N * 3, Score = 826, P = 1.5e-116 

TREMBL:MM47024_1 gene: "MenO"; product: "MEM3 " ; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds., N ° 1, Score = 3376, P 
- o 

TREMBL:S42186_1 gene: "VPS35"; product: "Vps35p n ; VPS35»vacuolar 
protein sorting [Saccharomyces cerevisiae=yeast, Genomic, 3790 ntl, N = 
3, Score = 813, P = 4.4e-115 

>TREMBL: MM47024 1 gene: "Mem3"; product: "MEMS"; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds. 
Length =» 754 

HSPs: 

Score = 3376 (506.5 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities - 666/721 (92%), Positives - 682/721 (94%) 

Query: 78 EVYLTDEFAKGRKVADLYELVQYAGNIIPRLYLLITVGVVYVKSFPQSRKDILKDLVEMC 137 

+VYLTDEFAKG ++ADLYELVQY+GNIIPRLYLLITVGVVYVKSFPQSRKDILKDLVEMC 
SbjCt: 34 KVYLTDEFAKGERLADL YELVQYSGN 1 1 PRL YLLI T VGVV YVKS FPQSRKDI LKDL VEMC 93 

Query: 138 RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 197 

RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAErWKLWVRM 
SbjCt: 94 RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 153 

Query: 198 QHQGHSRDREKRERERQELRILVGTNLVRLSQLEG-WVERYKQIVLTGILEQVVNCRDA 256 

QHQGHSRDREKRERERQELRILVGTNLV L+ + +QIVLTGILEQVVNCRDA 
SbjCt: 154 QHQGHSRDREKRERERQELRILVGTNLVALTLVSWRCKCGTLQQIVLTGILEQVVNCRDA 213 

Query: 257 LAQEYLMECIIQVFPOEFHLQTLNPFLRACAELHQNVNVKNIIIALIDRLALFAHREDGP 316 

LAQE MECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIIIALIDRLALFAHRE P 
SbjCt: 214 LAQEISMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIIIALIDRLALFAHREMEP 273 

Query: 317 GIPADIKLFDIFSQQVATVIQSRQDMPSEDWSLQVSLINLAMKCYPDRVDYVDKVLETT 376 

GIPA++KLFDIFSQQVATVIQSR+DMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 
SbjCt: 27 4 GIPAELKLFDIFSQQVATVIQSRRDMPSEDWSLQVSLINLAMKCYPDRVDYVDKVLETT 333 

Query: 377 VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYESR--K 434 

VEIFNKLNLEHI ATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYES K 
SbjCt: 334 VEIFNKLNLEHI ATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYESSPGK 393 

Query: 435 SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 494 

SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 
SbjCt: 394 SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 4 53 

Query: 495 IHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKVDDKWE 554 

IHLLRS + DPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSK + 
Sbjct: 454 IHLLRSDDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKWMTSGK 513 

Query: 555 KKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 614 

+ ++ F HQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 
SbjCt: 514 RNARRYFHLPHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 573 

Query: 615 EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKLLKKPDQGRAVSTCA 674 

EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRT+CALAASKLLKKPDQ C 
SbjCt: 574 EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTECALAASKLLKKPDQAEREHMCT 633 

Query: 675 HLFWSGRNTDKKGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 734 

L WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 
SbjCt: 634 SL-WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 692 

Query: 735 NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLRRESPESEGPIYEGL 794 

NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLR RRESPESEGPI YEGL 
SbjCt: 693 NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRTRRESPESEGPIYEGL 752 

Query: 795 IL 796 
IL 

Sbjct: 753 IL 754 

Pedant information for DKFZphtes3_72pl6, frame 3 
Report for DKFZphtes3_72pl6. 3 



[LENGTH] 



796 
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(MW) 

IPI) 

(HOMOL) 

3 (Mem3) 

(FUNCAT) 

[ FUN CAT J 

[FUNCAT] 

le-110 

[ FUNCAT] 

( FUNCAT ] 

le-110 

[FUNCAT] 

(FUNCAT] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[KWJ 

[KW1 



"Mem3 H ; product: "MEM3"; Mus rousculus maternal -embryonic 



91723.67 
5.32 

TREMBL:MM47024_1 gene: 
mRNA, complete cds. 0.0 

30.25 vacuolar and lysosomal organization [S. cerevisiae, YJL154c) le-110 

08.13 vacuolar transport [S. cerevisiae, YJL154c] le-110 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YJL154c] 

30.22 endosomal organization [S. cerevisiae, YJL154c) le-110 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YJL154c] 

30.08 organization of golgi (S. cerevisiae, YJL154c] le-110 

09.07 biogenesis of endoplasmatic reticulum [S. cerevisiae, YJL154c] le-110 
BL01092Q 

yeast vacuole le-108 
membrane protein le-108 
TRANSMEMBRANE 1 
LOW COMPLEXITY 5.40 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MPTTQQSPQDEQEKLLDEAIQAVKVQSFQMKRCLDKNKLMDSLKHASNMLGELRTSMLSP 
cccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

KSYYELYMAISDELHYLEVYLTDEFAKGRKVADLYELVQYAGNIIPRLYLLITVGWYVK 

cceeeeehhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccccccceeeeeceeeee 
MMMMMMMMMMMMMM 

SFPQSRKDILKDLVEMCRGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSM 

xxxxxxxxxxxxxx 

ecccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccccccccccccccch 
MMMMMMMMMM « 

DFVLLNFAEMNKLWVRMQHQGHSRDREKRERERQELRILVGTNLVRLSQLEGVNVERYKQ 

xxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhccchhhhhh 

IVLTGILEQVVNCRDALAQEYLMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIII 
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhccccchhhhhh 

ALIDRLALFAHREDGPGIPADIKLFDIFSQQVATVIQSRQDMPSEDVVSLQVSLINLAMK 
hhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhh 

CYPDRVDYVDKVLETTVEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKH 
cccccccchhhhhhhhhhhhhccchhhhhhccchhhhhhhhhccccccchhhhhhhhhhh 

FHPLFEYFDYESRKSMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPD 

xxxxxxxxxxxx 

hhhheeecccchhhhhhhhhhhhccccceeehhhhhhhhhhhhhhhhhhccccccccccc 

PEDFADEQSLVGRFIHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLA 

XXX • 

ccccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhcccccceeeeeccchhhhhhhhh 

FRYKENSKVDDKWEKKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHE 
hhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

TVAYEFMSQAFSLYEDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKL 
eeeeehhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

LKKPDQGRAVSTCAHLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLF 
hhcccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhchhhhhhhh 



SEQ 



IEILNRYIYFYEKENDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLR 
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PRD hhhhhhhhhhhccccceeeeehhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RESPESEGPIYEGLIL 

SEG 

PRD hhcccccccceeeccc 

MEM 



(No Prosite data available for DKFZphtes3_72pl6. 3) 
(No Pfam data available for DKFZphtes3_72pl6 . 3) 
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group: cell structure and motility 

DKFZphtes3_7b22 encodes a novel 443 amino acid protein with weak similarity to paramyosins. 

The novel protein is related to paramyosin, a major structural component of thick filaments 
and invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 



similarity to paramyosins 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: /map="3" 

Insert length: 2291 bp 

Poly A stretch at pos. 2241, polyadenylation signal at pos. 2213 



1 GGAAGAAAGG CTAGCGGGCG TTGGCCGTAT GTGGGTGTCT TGAGGCAGTT 
51 TTTCAGTTCT TTCATTTACC AAAGTGACAT GCACCTACTA GGTGCCAGGT 
101 GTTTAGACGT ACATACAACC CTCTGCAAAA TCTTTCAGTG TAGTCCTCTG 
151 TATGAAAAGT TTCCAGCCAA GAATTGCCAC TGCACCTGAG ATAAGGGGGA 
201 TCCTGGCCAT TAAGGAAACC TTGCCTTCGA AACTGAGCCG TGAGGAACTA 
251 TACAAAATGG GAAATTGGGA CAAATCCCAG TGGCTCATGA CACTAAGAAG 
301 TAAAATTACG AACTCACTGA GCTGGAAGTC ATTCAACGGG AATTGAATAG 
351 GTAACTGCAC TTTTGTGAGA TTATAAATAT ACCACGGAGG GTAACGAAGC 
401 TACAGAAGAA TGGAAGAAGA CAGCCTGGAA GACTCAAACC TTCCTCCAAA 
451 AGTTTGGCAT TCTGAGATGA CGGTGTCAGT GACAGGCGAA CCACCTAGTA 
501 CCGTAGAAGA AGAAGGAATA CCTAAAGAAA CAGACATAGA AATCATCCCA 
551 GAAATCCCGG AAACTCTAGA GCCACTGTCC CTTCCAGATG TGCTGAGGAT 
601 CTCGGCAGTT CTGGAGGACA CCACAGACCA GCTCTCTATT CTGAACTACA 
651 TCATGCCCGT TCAGTACGAA GGGAGACAGA GCATCTGCGT GAAAAGCAGA 
701 GAAATGAATC TAGAAGGAAC GAATCTAGAC AAACTTCCAA TGGCCTCAAC 
751 AATCACAAAA ATACCCAGTC CGTTAATAAC TGAGGAAGGA CCCAACTTGC 
801 CAGAAATCAG ACACAGAGGC CGGTTCGCTG TGGAGTTTAA CAAAATGCAG 
851 GATCTTGTCT TCAAAAAACC TACAAGGCAG ACCATCATGA CTACGGAGAC 
901 ACTGAAGAAA ATTCAGATTG ATAGGCAGTT TTTCAGCGAT GTGATTGCAG 
951 ATACCATTAA GGAGTTGCAA GATTCGGCCA CTTACAACAG TCTCCTGCAA 
1001 GCTTTGAGCA AAGAGAGGGA AAACAAAATG CATTTCTATG ACATCATTGC 
1051 CAGGGAGGAA AAAGGAAGAA AACAGATAAT ATCACTTCAA AAACAGCTAA 
1101 TTAATGTCAA AAAGGAATGG CAATTTGAAG TCCAGAGTCA GAATGAGTAT 
1151 ATTGCTAACC TCAAGGACCA ACTGCAAGAG ATGAAGGCAA AATCCAACTT 
1201 GGAGAATCGC TACATGAAAA CCAATACCGA GCTGCAGATT GCCCAGACCC 
1251 AGAAAAAGTG TAACAGAACA GAGGAACTCT TGGTGGAAGA GATTGAGAAA 
1301 CTCAGGATGA AAACCGAAGA AGAGGCCCGG AC TC AT AC AG AGATTGAAAT 
1351 GTTCCTTAGA AAGGAGCAGC AGAAACTTGA GGAGAGGCTG GAGTTCTGGA 
1401 TGGAGAAATA CGATAAGGAC ACAGAAATGA AACAGAATGA ACTAAATGCT 
1451 CTCAAAGCCA CAAAGGCCAG TGACTTAGCA CACCTTCAAG ACCTGGCAAA 
1501 GATGATAAGA GAGTATGAAC AGGTCATCAT TGAAGATCGT ATAGAAAAGG 
1551 AGAGGAGCAA GAAGAAGGTA AAACAGGATC TCTTGGAATT AAAGAGCGTT 
1601 ATAAAGCTCC AGGCCTGGTG GCGAGGCACT ATGATACGGA GAGAAATTGG 
1651 TGGTTTCAAG ATGCCTAAAG ACAAAGTTGA TAGCAAGGAT TCAAAAGGCA 
1701 AAGGTAAAGG CAAGGATAAG AGGAGAGGCA AGAAGAAGTG ACCAAGTTCT 
1751 CTTTTGTGTT TTCTGCTGGT ATTCTGGAGG TGGGAAGGAC TTGGAGAGTT 
1801 AAGAAACACC TGGTACCTCA AAGATGACTC ATCTACAGGT TGTTTCCTAT 
1851 TGAGACTTTC CCAGGGAAGC CTGATTTCAC TTTGCCTGTT AATTTCACTC 
1901 TGCCTGTTAG GTGGGTTTTC AAACCCTGAT TTAGGATTAC ACCATTGACT 
1951 TAGGGCTTCC TCATACCTTG CTGGGAAGAA GTTTCTAGTA GTCCTGTGAA 
2001 GATTCATTCT TCTTGCTCTT TCTCAGCAGA ACAAAGGAGT TCACTGGCTT 
2051 AGCTACAGTG ACGCATTGAA ACTTGAGTAA TTCCTGTAAT GTCAGATTTT 
2101 GATTTTACCC AATTTGTCTG TAGTGAAAAA ACTCTTATGA GCAAAAGTAT 
2151 TCAGTAGGAA TTACAATATG ATGTTATTAG CTGTCCAGCA TAATATATAC 
2201 ACAGCAAAGT TTTAATAAAT GTTGGTTCCT GCCTGCCTTT TAAAAAAAAA 
2251 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



Entry G36731 from database EMBL: 
SHGC-52923 Human Homo sapiens STS cDNA. 
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Score = 2262, P = 1.3e-97, identities = 462/468 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 410 bp to 1738 bp; peptide length: 443 
Category: similarity to known protein 



1 MEEDSLEDSN LPPKVWHSEM TVSVTGEPPS TVEEEGIPKE TDIEIIPEIP 
51 ETLEPLSLPD VLRISAVLED TTDQLSILNY IMPVQYEGRQ SICVKSREMN 
101 LEGTNLDKLP MASTITKIPS PLITEEGPNL PEIRHRGRFA VEFNKMQDLV 
151 FKKPTRQTIM TTETLKKIQI DRQFFSDVIA DTIKELQDSA TYNSLLQALS 
201 KEREN KMH FY DIIAREEKGR KQIISLQKQL INVKKEWQFE VQSQNEYIAN 
251 LKDQLQEMKA KSNLENRYMK TNTELQIAQT QKKCNRTEEL LVEEIEKLRM 
301 KTEEEARTHT EIEMFLRKEQ QKLEERLEFW MEKYDKDTEM KQNELNALKA 
351 TKASDLAHLQ DLAKMIREYE QVIIEDRIEK ERSKKKVKQD LLELKSVIKL 
401 QAWWRGTMIR REIGGFKMPK DKVDSKDSKG KGKGKDKRRG KKK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7b22, frame 2 

SWISSPROT:MYSP_BRUMA PARAMYOSIN., N a 1, Score =158, P » 5.8e-08 

PIR:A44972 paramyosin - nematode (Dirofilaria immitis) {fragment), N - 
1, Score = 157, P = 7.1e-08 

SWISSPROT:MYSP_ONCVO PARAMYOSIN . , N - 1, Score - 157, P « 7.4e-08 

PIR:S52537 emm L 15 protein - Streptococcus pyogenes, N - 1, Score » 
151, P = 8.6e-08 



>SWISSPROT:MYSP_BRUMA PARAMYOSIN. 
Length = 880 

HSPs: 



Score = 158 (23.7 bits), Expect « 5.8e-08, P » 5.8e-08 
Identities = 66/259 (25%), Positives - 125/259 (48%) 



Query: 


142 


EFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIADTIKELQDSATYNSLLQALSK 


201 




+ K + LK R TE K++ + +D +A + LQ A N LL+ + 




Sbjct: 


169 


QLKKDKHLAEKAAERFEAQTVELSNKVEDLNRHVND-LAQQRQRLQ— AENNDLLKEIHD 


225 


Query: 


202 


ER ENKMHF-YDIIAREEKGRKQIISLQKQLINVKKEWQFEVQSQNEYIANLKDQLQE 257 




++ +N H Y + + E+ R+++ +++ ++ + +VQ + + + D+ E 




Sbjct: 


226 


QKVQLDNLQHVKYQLAQQLEEARRRLEDAERERSQLQAQLH-QVQLELDSVRTALDE — E 


282 


Query: 


258 


MKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRMKT-EEEARTHTEIEMFL 


316 




A++ E++ NTE I Q + K + L EE+E LR K +++A + IE+ L 




Sbjct: 


283 


SAARAEAEHKLALANTE— ITQWKSKFDAEVALHHEEVEDLRKKMLQKQAEYEEQI EIML 


340 


Query: 


317 


RKEQQ--KLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQDLAKMIREYEQVII 


374 




+K Q K + RL+ +E DEQN+L+K +LK + E+I 




Sbjct: 


341 


QKISQLEKAKSRLQSEVEVLIVDLEKAQNTIAILERAK EQLEKTVNELKVRID 


393 


Query: 


375 


EDRIEKERSKKKVKQDLLELKSVIKL 400 






E +E E ++++ + L EL+ + L 




Sbjct: 


394 


ELTVELEAAQREARAALAELQKLKNL 419 




Score 


- 118 


(17.7 bits). Expect » 1.3e-03, P * 1.3e-03 




Identities = 54/231 (23%), Positives - 108/231 (46%) 




Query: 


181 


DTIKELQDSATYNSLLQ- ALSKERENKMHFYDI IAREEKG-RKQIISLQKQLINVKK 


235 




D +KE+ D LQ L+++ E + RE + Q+ +Q +L +V+ 




Sbjct: 


218 


DLLKEIHDQKVQLDNLQHVKYQLAQQLEEARRPxLEDAERERSQLQAQLHQVQLELDSVRT 


277 
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Query: 236 EWQFE — VQSQNEY-IANLKDQLQEMKAKSNLENRYMKTNTE-LQIAQTQKKCNRTEELL 291 

E +++ E+ +A ++ + K+K + E E L+ QK+ E++ 

Sbjct: 278 ALDEESAARAEAEHKLALANTEITQWKSKFOAEVALHHEEVEDLRKKMLQKQAEYEEQIE 337 

Query: 292 VEEIEKLRMKTEEEARTHTEIEMF— LRKEQQKLE— ERLEFWMEKYDKDTEMKQNELN 346 

+ ++K+ + + +R +E+E+ L K Q + ER + +EK + +++ +EL 
Sbjct: 338 IM-LQKISQLEKAKSRLQSEVEVLIVDLEKAQNTIAILERAKEQLEKTVNELKVRIDELT 396 

Query: 347 A-LKATKASDLAHLQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVI 398 

L+A + A L +L K+ YE+ + E + R KK++ DL E K + 
Sbjct: 397 VELEAAQREARAALAELQKLKNLYEKAV-EQKEALARENKKLQDDLHEAKEAL 448 

Score - X07 {16.1 bits), Expect = 2.1e-02, P - 2.1e-02 
Identities » 49/279 (17%), Positives = 124/279 (44%) 

Query: 123 ITEEGPNLPEIRHRGRFAV-EFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIAD 181 

IE L + R A+ E K+++L K ++ + E KK+Q 0 + +AD 

Sbjct: 392 IDELTVELEAAQREARAALAELQKLKNLYEKAVEQKEALAREN-KKLQDDLHEAKEALAD 4 50 

Query: 182 TIKELQDSATYNSLLQALSKERENKMHFYDIIAREEKGRKQ— IISLQKQLINVKKEWQF 239 

++L + N+L +E++ + R++RQ+ LQ+ I +++ Q 
Sbjct: 451 ANRKLHELDLENARLAGEIRELQTALKESEAARRDAENRAQRALAELQQLRIEMERRLQE 510 

Query: 240 EVQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTE-ELLVEEIEKL 298 

+ + N++ ++ + A L + + E+ + + + EE+V+++ 

Sbjct: 511 KEEEMEALRKNMQFEIDRLTAA--LADAEARMKAEISRLKKKYQAEIAELEMTVDNLNRA 568 

Query: 299 RMKTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAH 358 

+ + t+ + +e L+ + + +L+ +++Y + Q +++AL A + + 

Sbjct: 569 NIEAQKTIKKQSEQLKILQASLEDTQRQLQQTLDQY ALAQRKVS ALS A- ELEECKV 623 

Query: 359 LQDLAKMIREYEQVIIEDRIEKERSKKKVKQOLLELKSVIKLQ 401 

DA R+ ++ +E+ + V +L +K+ ++ + 

Sbjct: 624 ALDNAIRARKQAEIDLEEANGRITDLVSVNNNLTAIKNKLETE 666 



Pedant information for DKFZphtes3_7b22, frame 2 



Report for DKFZphtes3_7b22 .2 



[ LENGTH ] 

[MW] 

[pi] 

[HOMOL] 

[FUNCAT] 

[ FUN CAT) 

7e-07 

t FUNCAT] 

jannaschii 

( FUNCAT] 

( FUNCAT ] 

( FUNCAT ] 

( FUNCAT } 

[S. 

[FUNCAT] 
{ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
le-04 
I FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 



443 

51917.95 
6.18 

PIR:S28589 trichohyalin - rabbit 2e-08 

30.03 organization of cytoplasm [S. cerevisiae, 

08.07 vesicular transport (golgi network, etc.) 



YDL058w] 7e-07 
[S. cerevisiae, YDL058w) 



[M. 



1 genome replication, transcription, recombination and repair 
MJ1322] 5e-06 

03.22 cell cycle control and mitosis [S. cerevisiae, YPR141c] le-05 
03.13 meiosis [S. cerevisiae, YPRl41c] le-05 

11.01 stress response [S. cerevisiae, YPR141c] le-05 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YPR141c] le-05 

08.22 cytoskeleton-dependent transport (S. cerevisiae, YPRl41c] le-05 

09.10 nuclear biogenesis [S. cerevisiae, YPRl41c) le-05 

30.05 organization of centrosome IS. cerevisiae, YPR141c] le-05 

06.10 assembly of protein complexes [S. cerevisiae, YPR141c] le-05 

99 unclassified proteins [S. cerevisiae, YOR216c] 3e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

(S. cerevisiae, YKR095w] 6e-05 

30.10 nuclear organization (S. cerevisiae, YKR095wJ 6e-05 

30.02 organization of plasma membrane [S. cerevisiae, YER008c) le-04 
08.16 extracellular transport (S. cerevisiae, YEROOSc] le-04 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YER008c] 



30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] 2e-04 
08.01 nuclear transport (S. cerevisiae, YDL207w] 4e-04 

04.07 rna transport IS. cerevisiae, YDL207w] 4e-04 
06.07 protein modification (glycolsylat ion, acylation, myristylation, 
palmitylation, f arnesylation and processing) [S. cerevisiae, YKL201c] 5e-04 

(EC) 3.6.1.32 Myosin ATPase 3e-08 

[PIRKW] phosphotransferase 6e-06 

[PIRKW] citirulline 8e-06 

(PIRKW) tandem repeat le-07 

[PIRKW] heart 6e-06 

[PIRKW] polymorphism 4e-06 

[PIRKW] serine/threonine-specif ic protein kinase 6e-06 

[PIRKW] DNA binding 8e-08 
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[PIRKW] 


muscle contraction le-07 


(PIRKW) 


actin binding 3e-08 


(PIRKW) 


ATP 3e-08 


[ PIRKW] 


thick filament le-07 


( PIRKW] 


phosphoprotein 3e-08 


{ PIRKW] 


glycoprotein 4e-06 


[PIRKW] 


skeletal muscle le-07 


[ PIRKW] 


calcium binding 8e-06 


(PIRKW] 


alternative splicing 3e-08 


(PIRKW] 


coiled coil 3e-08 


(PIRKW] 


P-loop 3e-08 


( PIRKW] 


heptad repeat 4e-06 


(PIRKW] 


methylated amino acid 3e-08 


[PIRKW] 


basement membrane 4e-06 


[PIRKW] 


cardiac muscle 6e-06 


(PIRKW) 


extracellular matrix 4e-06 


[PIRKW] 


hydrolase 3e-08 


[PIRKW] 


membrane protein 4e-06 


(PIRKW) 


EF hand 8e-06 


[PIRKW] 


cytoskeleton 8e-06 


( PIRKW] 


hair 8e-06 


(SUPFAM) 


myosin heavy chain 3e-08 


[SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 6e- 


[SUPFAM] 


calraodulin repeat homology 8e-06 


[SUPFAM] 


myosin motor domain homology 3e-08 


[SUPFAM] 


trichohyalin 8e-06 


[SUPFAM] 


protein kinase homology 6e-06 


[PR0SITE1 


AMI DAT ION 2 


(PROSITE) 


CAMP PHOSPHO SITE 1 


(PROSITE) 


CK2 PHOSPHO SITE 12 


[ PROSITE] 


TYR PHOSPHO SITE 2 


( PROSITE] ■ 


PKC PHOSPHO SITE 4 


( PROSITE J 


ASN~GLYCOSYLATION 1 


IKW] 


All~Alpha 


IKW] 


LOW~COMPLEXITY 10.61 % 



SEQ MEEDSLEDSNLPPKVWHSEMTVSVTGEPPSTVEEEGIPKETDIEIIPEIPETLEPLSLPD 

SEG xxxxxxxxxxxxxxxxxxxxxxx. 

PRO cccccccccccccccccceeeeeccccccceeeeecccccceeeeeeccccccccccccc 

SEQ VLRISAVLEDTTDQLSILNYIMPVQYEGRQSICVKSREMNLEGTNLDKLPMASTITKIPS 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ PLITEEGPNLPEIRHRGRFAVEFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIA 

SEG > 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DTI KELQDS AT YNSLLQALSKERENKMHFYDI I AREEKGRKQI I SLQKQLINVKKEWQFE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DLAKMIREYEQVI IEDRIEKERSKKKVKQDLLELKSVIKLQAWWRGTMIRREIGGFKMPK 

SEG • x 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ DKVDSKDSKGKGKGKDKRRGKKK 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccc 



Prosite for DKFZphtes3_7b22 . 2 



PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 



285->289 
152->156 
164->167 
182->185 
280->283 
383->386 
5->9 
30->34 



AS N_GL YCOS YLAT I ON 
CAMP PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO SITE 
PKC_PHOSPHO"SITE 
PKC_PHOSPHO_SITE 
CK2 PHOSPHORITE 
CK2~PHOSPHO SITE 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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PS00006 


41 


->45 


PS00006 


57 


->61 


PS00006 


104- 


>108 


PS00006 


182- 


>186 


PS00006 


243- 


>247 


PS00006 


262- 


>266 


PS00006 


271- 


>275 


PS00006 


302- 


>306 


PS00006 


308- 


>312 


PS00006 


310- 


>314 


PS00007 


261- 


>269 


PS00007 


184- 


>193 


PS00009 


218- 


>222 


PS00009 


439- 


>443 



CK2_PHOSPHO_ 

CK2_PHOSPHO_ 

CK2_PHOSPHO_ 

CK2_PHOSPHO 

CK2_PHOSPHO 

CK2 PHOSPHO" 

CK2~PHOSPHO 

CK2~PHOSPHO" 

CK2 PHOSPHO" 

CK2~PHOSPHO 

TYR_PHOSPHO 

TYR_PHOSPHO" 

AMI DAT I ON 

AMIDATION 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PD0C00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_7b22 .2) 
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DKFZphtes3_7dl7 



group: testes derived 

DKF2phtes3_7dl7 encodes a novel 633 amino acid protein with weak similarity to human KIAA0454. 
Pfam predicts a TNFR/NGFR cysteine-rich region. 

No informative BLAST results; No predictive prosite or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to KIAA0454 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3608 bp 

Poly A stretch at pos . 3587, polyadenylation signal at pos. 3570 



1 GGGAAGTTAC GGCGAAGTCC ACCCAGCGTT TCTCAGGCAA TCTGAAGGCA 
51 AATCCTGTTT AGACCCAGGC GAAGGTTCCT GGTGACCCAG GCTCTCACCA 
101 GCCAATTGTC CCTTGCCGTC CTCCTGAGGG TATCTGGAGC TTCAGTGCTG 
151 TGTGCTCTTG GCCTCCACAC TGGGGATGCC ACTGACTCCC ACTGTCCAGG 
201 GCTTCCAGTG GACTCTCCGA GGCCCTGATG TAGAAACTTC CCCATTCGGT 
251 GCACCAAGAG CAGCCTCACA TGGTGTGGGC CGACATCAAG AGCTGCGAGA 
301 TCCAACAGTC CCTGGCCCCA CCTCTTCTGC CACAAACGTC AGCATGGTGG 
351 TATCTGCCGG CCCTTGGTCC GGTGAGAAGG CAGAGATGAA CATTCTAGAA 
401 ATCAACAAGA AATCGCGCCC CCAGCTGGCA GAGAACAAAC AGCAGTTCAG 
4 51 AAACCTCAAA CAGAAATGTC TT.GTAACTCA AGTGGCCTAC TTCCTGGCCA 
501 ACCGGCAAAA TAATTACGAC TATGAAGACT GCAAAGACCT CATAAAATCT 
551 ATGCTGAGGG ATGAGCGGCT GCTCACAGAA GAGAAGCTTG CAGAGGAGCT 
601 CGGGCAAGCT GAGGAGCTCA GGCAATATAA AGTCCTGGTT CACTCTCAGG 
651 AACGAGAGCT GACCCAGTTA AGGGAGAAGT TACAGGAAGG GAGAGATGCC 
701 TCCCGCTCAT TGAATCAGCA TCTCCAGGCC CTCCTCACTC CGGATGAGCC 
7 51 GGACAACTCC CAGGGACGGG ACCTCCGAGA ACAGCTGGCT GAGGGATGTA 
801 GGCTGGCACA GCACCTCGTC CAAAAGCTCA GCCCAGAAAA TGATGACGAT 
851 GAGGATGAAG ATGTTAAAGT TGAGGAGGCT GAGAAAGTAC AGGAATTATA 
901 TGCCCCCAGG GAGGTGCAGA AGGCTGAAGA AAAGGAAGTC CCTGAGGACT 
951 CACTGGAGGA GTGTGCCATC ACTTGTTCAA ATAGCCACCA CCCTTGTGAG 
1001 TCCAACCAGC CTTACGGGAA CACCAGAATC ACATTTGAGG AAGACCAAGT 
1051 CGACTCAACT CTCATTGACT CATCCTCTCA TGATGAATGG TTGGATGCTG 
1101 TATGCATTAT CCCAGAAAAT GAAAGTGATC ATGAGCAAGA GGAAGAAAAA 
1151 GGGCCAGTGT CTCCCAGGAA TCTGCAGGAG TCTGAAGAGG AGGAAGCCCC 
1201 CCAGGAGTCC TGGGATGAAG GTGATTGGAC TCTCTCAATT CCTCCTGACA 
1251 TGTCTGCCTC ATACCAGTCT GACAGGAGCA CCTTTCACTC AGTAGAGGAA 
1301 CAGCAAGTCG GCTTGGCTCT TGACATAGGC AGACATTGGT GTGATCAAGT 
1351 GAAAAAGGAG GACCAAGAGG CCACAAGTCC CAGGCTCAGC AGGGAGCTGC 
1401 TGGATGAGAA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATTTTAT 
1451 TCAACTCCTT TTGAGTACCT GGAACTGCCT GACTTATGCC AGCCCTACAG 
1501 AAGTGACTTT TACTCATTGC AGGAACAACA CCTTGGCTTG GCTCTTGACT 
1551 TGGACAGAAT GAAAAAGGAC CAAGAAGAGG AAGAAGACCA AGGCCCACCA 
1601 TGCCCCAGGC TCAGCAGAGA GCTGCCGGAG GTAGTAGAGC CTGAGGACTT 
1651 GCAGGACTCA CTGGATAGAT GGTATTCGAC TCCTTTCAGT TATCCAGAAC 
1701 TGCCTGATTC ATGCCAGCCC TACGGAAGTT GCTTTTACTC ATTGGAGGAA 
1751 GAACACGTTG GCTTTTCTCT TGACGTGGAT GAAATTGAAA AGTACCAAGA 
1801 AGGGGAAGAA GATCAAAAGC CACCATGCCC CAGGCTCAAC GAGGTGCTGA 
1851 TGGAAGCAGA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATGTTAT 
1901 TCGACTACTT CAACTTACTT TCAACTACAT GCCTCATTCC AGCAGTACAG 
1951 AAGTGCCTTT TACTCATTTG AGGAACAGGA CGTCAGCTTG GCCCTTGACG 
2001 TGGACAATAG GTTTTTTACT TTGACAGTGA TAAGGCACCA CCTGGCCTTC 
2051 CAGATGGGAG TCATATTCCC ACACTAAGCA GCCCTTACTA AGCTGAGAGA 
2101 TGTCATTGCT GCAGGCAGGA CCTATAGGCA CATGTAGGTT TGAATGAAAC 
2151 TGTAGTTCCC TTTGGAAGCC CAGTCATAGG ATGGGAAAGT GGGCATGGCT 
2201 CTATTCCTAT TCTCAGACCA TGCCAGTGGC CACCTGTGCT CAGTCTGAAG 
2251 ACGTTGGACC CAAGTTAGGT GTGACACGTT CACACGACTA TGTAGCACAT 
2301 GCCGGGAGTG ATCTGCCAGA CATTCTAATT TGAACCAGAT ATCTCTGGGT 
2351 AGCTACAAAG TTCCTCAGGG GTTTCATTTT GCAGGCATGT CTCTGAGCTT 
2401 CTATACCTGC TCAAGGTCAG TGTCATCTTT GTGTTTAGCT CATCCAAAGG 
2451 TGTTACCCTG GTTTCATTGA ACCTAACCCC ATTCTTTGTA TCTTCAGTGT 
2501 TGGTTTGTTT TAGCTGATCC ATCTGTAACA CAGGAGGGAT CCTTGGCTGA 
2551 GGATTGTATT TCAGAACCAC TGACTGCTCT TGACAGTTGT TAACCCACTA 
2601 GGCTCCTTTG AGTAGAGAAG CCATAGTCCT TCAGCCTCCA ATTGATATCA 
2651 ATACTTAGGA AGACCACAGC TAGACGGACA AACAGCATTG GGAGGCCTTA 
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2701 GTCCTGCTCC TTTCAATTCC 
2751 TGGCAAGAGA CAGCATGTCA 
2801 AATGCCATGT TCTTGCAGAA 
2851 TCACCAGACA ACTGCAGAAT 
2901 CTCCTTCACA CAGTCCACGT 
2951 AGATATTTTG GGTTCAGAAG 
3001 AGTTATTTTG AACCCCAAAT 
3051 TTTTGGTGAC ATGGACTTGT 
3101 ATGGTCTACA TTCTGAAGTT 
3151 CCTAAACGTT TCATCAAGAA 
3201 CCTCAGCCCA TCTGTGGGCA 
3251 CATGATATCA GGACTGGTTA 
3301 CCCTTTTAGA GACACCTTAC 
3351 TCAAAGTAGA AATGTCCTGT 
3401 CATTTATTAA TCATCCCTGC 
3451 GCTGGAAATT TGCTGCCTCA 
3501 TGTGTTGTTG AAAAAAAAAC 
3551 AAGTTATTTT AATCTATACA 
3601 AAAAAAAA 



ATCCTGTAAA GAACAGGAGT CAGGAGCCGC 
CCTGGGACTC TGCCAGTGCA GAATATGAAC 
AATGCTTAGC CTGAGTTTCA TAGGAGGTAA 
GTAGAACACT GAGCAGGACA ACTGACCTGT 
CACCACGAAT CACACAACAA AAAGGAGGAG 
AAGTAAATGA TAATGTAGCT ACATTTCTTT 
ATTTCCTCAT CTTTTTGTTG TTGTCATTGA 
TTGTAGAGGA CAGGTCAGCT GTCTGGCTCA 
GTCTGAAAAT GTCTTCATGA TTAAATTCAG 
CACTACAGAG TCGATACTGT GAGTTTCCAA 
GAGAAGGTCT AGTTTGTCCA TCAGCATTAT 
CTTGGTTAAG GAGGGGTCTA GGAGATCTGT 
TTATGATGAA GTATTTGGGA GAGTGGTTTT 
ATTCCAGTGA TCATCCTCTA AACGTTTTAT 
CTGTGTCTAT TATTATATTC ATATCTCTAC 
ATGTTTACTG TGCCTTTGTT TTTGCTAGTG 
ATTCTCTGCC TGAGTTTTAA TTTTTGTCCA 
ATTAAAAACT TTTGCCTATC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 176 bp to 2074 bp; peptide length: 633 
Category: similarity to known protein 



1 MPLTPTVQGF QWTLRGPDVE TSPFGAPRAA SHGVGRHQEL RDPTVPGPTS 

51 SATNVSMVV5 AGPWSGEKAE MNILEINKKS RPQLAENKQQ FRNLKQKCLV 

101 TQVAYFLANR QNNYDYEDCK DLIKSMLRDE RLLTEEKLAE ELGQAEELRQ 

151 YKVLVHSQER ELTQLREKLQ EGRDASRSLN QHLQALLTPD EPDNSQGRDL 

201 REQLAEGCRL AQHLVQKLSP ENDDDEDEDV KVEEAEKVQE LYAPREVQKA 

251 EEKEVPEDSL EECAITCSNS HHPCESNQPY GNTRITFEED QVDSTLIDSS 

301 SHDEWLDAVC II PENESDHE QEEEKGPVSP RNLQESEEEE APQESWDEGD 

351 WTLSIPPDMS ASYQSDRSTF HSVEEQQVGL ALDIGRHWCD QVKKEDQEAT 

401 SPRLSRELLD EKEPEVLQDS LDRFYSTPFE YLELPDLCQP YRSDFYSLQE 

451 QHLGLALDLD RMKKDQEEEE DQGPPCPRLS RELPEWEPE OLQDSLDRWY 

501 STPFSYPELP DSCQPYGSCF YSLEEEHVGF SLDVDEIEKY QEGEEDQKPP 

551 CPRLNEVLM3 AEEPEVLQDS LDRCYSTTST YFQLHASFQQ YRSAFYSFEE 
601 QDVSLALDVD NRFFTLTVIR HHLAFQMGVI FPH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7dl7, frame 2 

PIR:T00069 hypothetical protein KIAA0454 - human (fragment), N = 1, 
Score = 199, P - le-11 

PIR:A45592 liver stage antigen LSA-1 - Plasmodium falciparum, N = 1, 
Score = 158, P = 2.7e-07 



>PIR:T00069 hypothetical protein KrAA0454 - human (fragment) 
Length - 1,882 

HSPs: 

Score =* 199 (2*3.9 bits), Expect = 1.0e-ll, P = 1.0e-ll 
Identities =• 74/261 (28%), Positives - 122/261 (46%) 

Query: 117 EDCKDLIKSMLRDERLLT EEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEG 172 

+D + LI+ + + E L EEKLAEEL A +Y L+ Q REL+ LR+K++EG 

Sbjct: 964 KDLESLIQRVSQLEAQLPKNGLEEKLAEELRSASWPGKYDSLIQDQARELSYLRQKIREG 1023 
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Query: 173 RDASRSLNQH LQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDD 225 

R + +H + LL ++ D G+ REQLA+G +L + L KLS ++ 

Sbjct: 1024 RGICYLITRHAKDTVKSrEDLLRSNDIDYYLGQSFREQLAQGSQLTERLTSKLSTKDHKS 1083 

Query: 226 EDEDVKVEEAEKVQELYAPREVQKAEEK-EVPEDSLEECAITCSNSHHPCESNQPYGNTR 284 

E + +£ L RE+Q+ E+ EV + L+ ++T S+SH +S++ +T 

Sbjct: 1084 EKDQAGLEPLA LRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 

Query: 285 ITFEEDQV — DSTLIDSSSHDEWLDAVCIIPENESDHEQEEEKGPVSPRNLQESEEEEAP 342 

+E + D ++ +H E A P + +S + S + A 

Sbjct: 1140 FLSDELEACSDMDIVSEYTHYEEKKAS PSHSDSIHHSSHSAVLSSKPSSTSASQGAK 1196 

Query": 34 3 QESWDEGDWTLSIPPDMSASYQSDRSTFH 371 

ES + +L P + S FH 

Sbjct: 1197 AES-NSNPISLPTPQNTPKEANQAHSGFH 1224 

Score = 89 (13.4 bits), Expect =• l.le-01, P * 1.0e-01 
Identities » 35/89 (39%), Positives = 44/89 (49%) 

Query: 464 KDQEEEEDQG PPCPRLSRELPEWEP-EDLQDSLDRWYSTPFSYPELPDSCQ-PYGS 518 

KD + E+DQ P RLSREL E + E LQ LD TP S L DS + P + 
Sbjct: 1079 KDHKSEKDQAGLEPLALRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSST 1138 

Query: 519 CFYSLEEEHVGFSLDVDEIEKYQEGEEDQKPP 550 

F S E E D+D + +Y EE + P 

Sbjct: 1139 SFLSDELEACS DMDIVSEYTHYEEKKASP 1167 

Score - 73 (11.0 bits), Expect = 4.8e+00, P - 9.9e-01 
Identities - 31/88 (35%), Positives - 40/88 (45%) 

Query: 390 DQVKKEDQEATSP RLSRELLD-EKEPEVLQDSLDRFYSTPFEYLELPDLCQ-PYRSD 4 44 

D ++DQ P RLSREL + EK EVLQ LD TP L D + P + 

Sbjct: 1080 DHKSEKDQAGLEPLALRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 

Query: 4 45 FYSLQEQHLGLALDLDRMKKDQEEEEDQGPP 475 

F S L D+D + + EE + P 

Sbjct: 1140 FLS DELEACSDMDIVSEYTHYEEKKASP 1167 

Score - 68 (10.2 bits), Expect = l.le-01, P - 1.0e-01 
Identities - 36/156 (23%), Positives = 68/156 (43%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



31 SHGVGRHQELRDPTV PGPTSSATNVSMVVSAGPWS GEKAEMNILEINKK 79 

S G +HQE +TVPPS + V A G++++ ♦ 

684 SPGKHQHQEEGNVTVRPFPRPQSLDLGATFTVDAHQLDNQSQPRDPGPQSAFSLPGSTQH 743 

80 SRPQLAENKQQFRNLKQKCLVTQVAYFL-ANRQNNYDYE-DCKDLIKSMLRDERLLTEEK 137 
R QL++ KQ++++L++K L+++ F AN Y + L+K + ++ ++ 

744 LRSQLSQCKQRYQDLQEKLLLSEATVFAQANELEKYRVMLTGESLVKQDSKQIQVDLQDL 803 

138 LAEELGQAEELRQYKVLVHSQERELTQLREK-LQEG 172 

E G++E + + + E L+E L EG 

804 GYETCGRSENEAEREETTSPECEEHNSLKEMVLMEG 839 

• 2.0e-01 



Score = 65 (9.8 bits), Expect - 2.2e-01, 
Identities = 23/96 (23%), Positives - 52/96 (54%) 



Query: 
Sb.jct: 
Query: 
Sbjct: 



123 IKSMLRDERLLTEEKLAEELGQAEE- 
++ + D+ + E + E+ EE 



-LRQYKVLVHSQERELTQLREKLQEGRDASRS 178 
LRQ ++ V ++ +L +LR+ L ++ + 



5 LRQRIHDKAVALERAIDEKFSALEEKEKELRQLRIAVRERDHDLERLRDVLS SNEA 60 

179 LNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKL 218 
Q +++LL ++G ++ EQL+ C+ Q L +++ 

61 TMQSMESLL RAKGLEV-EQLSTTCQNLQWLKEEM 93 



Score - 61 (9.2 bits), Expect = 5.5e-01, P =» 4.2e-01 
Identities = 27/95 (28%), Positives =• 47/95 (49%) 



134 TEEK-LAEELGQAEELRQY KVLVHSQERELTQLREKLQEGRDASRSLNQHLQALLT 188 

+E K L +LG+ EE R Y +LV +++ L+ +LQ ++L +++L 

855 SERKPLENQLGKQEEFRVYGKSENILV— LRKDIKDLKAQLQNANKVIQNLKSRVRSLSV 912 

189 PDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDE 228 

+ +S R R+ A G ++ SP + DEDE 

913 TSDYSSSLERP-RXLRAVGT LEGSS PHSVPDEDE 945 



Query: 
Sbjct: 
Query: 
Sbjct: 

Score = 57 (8.6 bits), Expect = 1.4e+00, P = 7.5e-01 
Identities = 26/92 (28%), Positives = 47/92 (51%) 

Query: 127 LRDERLLTEEKLAEELGQAEEL RQYKVLVHSQERELTQLREKLQEGRDASRSLNQHL 183 

L E LL EK+A Q +E+ R+ ++L+ + L R+LE ARL L 
Sbjct: 358 LTQEVLLLREKVASVESQGQEISGNRRQQLLLMLEG — LVDERSRLNEALQAERQLYSSL 415 
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Query: 184 QALLTPDEPDNSQ-GRDLREQLAEGCRLAQHLVQKL 218 

P++S+ R L+ +L EG ++ + ++++ 
Sbjct: 416 VKFHA — HPESSERDRTLQVEL-EGAQVLRSRLEEV 448 



Score ■ 54 
Identities 



(8.1 bits), Expect =» 2.7e+00, P - 9.3e-01 
= 61/264 (23%), Positives - 121/264 (45%) 



Query: 3 LTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQE — LRDPTVPGPTSSATNVSMWS 60 

L+ T Q QW L+ ++ET F + + + + L D SAT ++ 
Sbjct: 79 LSTTCQNLQW-LK-EEMETK-FSRWQKEQESIIQQLQTSLHDRNKEVEDLSAT LLCK 132 

Query: 61 AGPWSGEKAEMNILEINKKSR—-PQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYE 117 

GP E AE + +K R L++ +Q L+ + + + ++ R+ 

Sbjct: 133 LGPGQSEIAEELCQRLQRKERMLQDLLSDRNKQV--LEHEMEIQGLLQSVSTREQE-SQA 189 

Query: 118 DCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELT QLREKLQEG-- 172 

+ L+++++ ER +L+LG+L ++ +Q+ E+T +L ++ +G 
Sbjct: 190 AAEKLVQALM — ERNSELQALRQYLGGRDSLMS-QAPISNQQAEVTPTGRLGKQTDQGSM 246 

Query: 173 RDASRSLNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKV 232 

+ SR + L A P ++ G DL + +A G L ++LS N +E E + 

Sbjct: 247 QIPSRDDSTSLTAKEDVSIPRSTLG-DL-DTVA-G LEKELS--NAKEELELMAK 295 

Query: 233 EEAEKVQELYAPREVQKAEEKEVPEDSLEECAIT 266 

+E E EL A + + +E+E+ + + ++T 
Sbjct: 296 KERESQMELSALQSMMAVQEEELQVQAADMESLT 329 

Score - 49 (7.4 bits), Expect « 6.3e+00, P = 1.0e+00 
Identities = 21/87 (24%), Positives - 39/87 (44%) 

Query: 192 PDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQELYAPREVQKAE 251 

P ++Q LR QL++ + Q L +KL + +EEK++ +K+ 

Sbjct: 738 PGSTQ--HLRSQLSQCKQRYQDLQEKLLLS EATVFAQANELEKYRVMLTGESLVKQD 792 

Query: 252 EKEVPEDSLEECAI -TCSNSHHPCESNQ 278 

K++ D L++ TC S + E + 
Sbjct: 793 SKQIQVD-LQDLGYETCGRSENEAEREE 819 

Score - 46 (6.9 bits), Expect - 6.3e+00, P » 1.0e+00 
Identities = 19/77 (24%), Positives « 39/77 (50%) 



Query: 
Sbjct: 
Query: 
Sbjct: 



112 NNYDYEDCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQ- 170 

+ ++ E+ K+ K + E ++T+E L+E QAE R+ + + + + L+E+L 
597 DGWEIEEDKE— KGEVMVETWTKEGLSESSLQAE-FRKLQGKLKNAHNIINLLKEQLVL 653 

171 EGRDASRSLNQHLQALLT 188 

+ + + L L LT 
654 SSKEGNSKLTPELLVHLT 671 



Pedant information for DKFZphtes3_7dl7 , frame 2 



Report for DKFZphtes3_7dl7 . 2 



(LENGTH] 633 

[MW] 72951.15 

(pi) 4.40 

[HOMOL] PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 2e-ll 

[BLOCKS] BL00201E 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 14 

[ PROSITE] PKC_PHOSPHO_SITE 4 

(PROSITE] ASN_GLYCOSYLATION 2 

( PFAM] TNFR/NGFR cysteine-rich region 

(KW] All_Alpha 

(KW] LOW_COMPLEXITY 4.90 % 

[KW] COILED_COIL 6.95 % 



SEQ MPLTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQELRDPTVPGPTSSATNVSMWS 

SEG 

PRO ccccceeeeeeeecccccccccccccccccccccccccccccccccccccceeeeeeeee 

COILS 

SEQ AGPWSGEKAEMNILEINKKSRPQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYEDCK 

SEG 

PRD ccccccchhhhhhhheeecccchhhhhhhhhhhcccccchhhhhhhhhhcccccccccch 

COILS 
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SEQ DLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEGRDASRSLN 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQE 

SEG xxxxxxxxxxxxxxxx . . 

PRD hhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhh 

COILS CCCCCCC 

SEQ LYAPREVQKAEEKEVPEDSLEECAITCSNSHHPCESNQPYGNTRITFEEDQVDSTLIDSS 

SEG 

PRD hhhcchhhhhhhhhhcchhhhhhhccccccccccccccccccceeeeecccccccccccc 

COILS 

SEQ SHDEWLDAVCIIPENESDHEQEEEKGPVSPRNLQESEEEEAPQESWDEGDWTLSIPPDMS 

SEG xxxxxxxxxxxxxxx 

PRD ccchhhhheeeccccccchhhhhhcccccccccchhhhhhhccccccccccccccccccc 

COILS 

SEQ ASYQSDRSTFHSVEEQQVGLALDIGRHWCDQVKKEDQEATSPRLSRELLDEKEPEVLQDS 

SEG 

PRD ccccccccchhhhhhhhhhhhhhccccccchhhhhccccccchhhhhhhhhhhheeeecc 

COILS 

SEQ LDRFYSTPFEYLELPDLCQPYRSDFYSLQEQHLGLALDLDRMKKDQEEEEDQGPPCPRLS 

SEG 

PRD hhhhhccceeeeecccccccccccchhhhhhhhhhhhhcchhhhhhhhhhcccccccccc 

COILS 

SEQ RELPEWEPEDLQDSLDRWYSTPFSYPELPDSCQPYGSCFYSLEEEHVGFSLDVDEIEKY 

SEG 

PRD ccceeeeeccchhhhhhhhhccccccccccccccccccceeeeccceeeccccchhhhhh 

COILS 

SEQ QEGEEDQKPPCPRLNEVLMEAEEPEVLQDSLDRCYSTTSTYFQLHASFQQYRSAFYSFEE 

SEG 

PRD hcccccccccccchhhhhhhhhchhhhhccccceeecceeeehhhhhhhhhhhhhhhhhc 

COILS 

SEQ QDVSLALDVDNRFFTLTVI RHHLAFQMGVI FPH 

SEG 

PRD cchhhhhhcccchhhhhhhhhhhhhhhhhcccc 

COILS 



Prosite for DKFZphtes3_7dl7 . 2 



PS00001 


54 


->58 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


315- 


>319 


ASN~GLYCOSYLATION 


PDOC00001 


PS00005 


13 


->16 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


329- 


>332 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


365- 


>368 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


401- 


>404 


PKC PHOSPHORS ITE 


PDOC00005 


PS00006 


188- 


>192 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


259- 


>263 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


286- 


>290 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


295- 


>299 


CK2~PHOSPHO_SITE 


PDOC00006 


PS00006 


300- 


>304 


CK2~PHOSPHO"SITE 


PDOC00006 


PS00006 


317- 


>321 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


336- 


>340 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


345- 


>349 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


372- 


>376 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


427- 


>431 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


447- 


>451 


CK2~PHOSPHO SITE 


PDOC00006 


PS0O006 


505- 


>509 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


522->526 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


597->601 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


25->31 


MYRISTYL 


PDOC00008 


PS00008 


207- 


•>213 


MYRISTYL 


PDOC00008 



Pfam for DKFZphtes3_7dl7 . 2 
HMM_NAME TNFR/NGFR cysteine-rich region 

HMM ♦CpeGtYtDWNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

C+ ++ + N+ ++ + ++ + +++ +++ ++VC 

Query 274 CESNQPYG-NT-RITFEEDQVDS--TLIDSSSHDEWLDAVC 310 



941 



WO 01/12659 
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DKFZphtes3_7j3 



group: cell cycle 

DKF2phtes3 7j3.2 encodes a novel 628 amino acid putative protein kinase, which is related to 
the C-TAKl~Cdc25C associated protein kinase. 

Cdc25C is a protein kinase that controls entry into mitosis by dephosphorylation of Cdc2. 
Cdc25C function is regulated by phosphorylation, too. Serine 216 phosphorylation of Cdc25C 
mediates the binding of 14-3-3 protein to Cdc25C. C-TAK1 (Cdc twenty-five C associated protein 
kinase) phosphorylates Cdc25C on serine 216 in vitro. The new protein is closely related to C- 
Takl and therefore should be involved in cell-cycle regulation, too. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to serine/threonine-specific protein kinases 

complete cDNA, complete cds, potential start at Bp 128, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3443 bp 

Poly A stretch at pos. 3399, polyadenylation signal at pos. 3376 



1 GTGCTTTACT GCGCGCTCTG GTACTGCTGT GGCTCCCCGT CCTGGTGCGG 
51 GACCTGTGCC CCGCGCTTCA GCCCTCCCCG CACAGCCTAC TGATTCCCCT 
101 GCCGCCCTTG CTCACCTCCT GCTCGCCATG GAGTCGCTGG TTTTCGCGCG 
151 GCGCTCCGGC CCCACTCCCT CGGCCGCAGA GCTAGCCCGG CCGCTGGCGG 
201 AAGGGCTGAT CAAGTCGCCC AAGCCCCTAA TGAAGAAGCA GGCGGTGAAG 
251 CGGCACCACC ACAAGCACAA CCTGCGGCAC CGCTACGAGT TCCTGGAGAC 
301 CCTGGGCAAA GGCACCTACG GGAAGGTGAA GAAGGCGCGG GAGAGCTCGG 
351 GGCGCCTGGT GGCCATCAAG TCAATCCGGA AGGACAAAAT CAAAGATGAG 
401 CAAGATCTGA TGCACATACG GAGGGAGATT GAGATCATGT CATCACTCAA 
451 CCACCCTCAC ATCATTGCCA TCCATGAAGT GTTTGAGAAC AGCAGCAAGA 
501 TCGTGATCGT CATGGAGTAT GCCAGCCGGG GCGACCTTTA TGACTACATC 
551 AGCGAGCGGC AGCAGCTCAG TGAGCGCGAA GCTAGGCATT TCTTCCGGCA 
601 GATCGTCTCT GCCGTGCACT ATTGCCATCA GAACAGAGTT GTCCACCGAG 
651 ATCTCAAGCT GGAGAACATC CTCTTGGATG CCAATGGGAA TATCAAGATT 
701 GCTGACTTCG GCCTCTCCAA CCTCTACCAT CAAGGCAAGT TCCTGCAGAC 
7 51 ATTCTGTGGG AGCCCCCTCT ATGCCTCGCC AGAGATTGTC AATGGGAAGC 
801 CCTACACAGG CCCAGAGGTG GACAGCTGGT CCCTGGGTGT TCTCCTCTAC 
851 ATCCTGGTGC ATGGCACCAT GCCCTTTGAT GGGCATGACC ATAAGATCCT 
901 AGTGAAACAG ATCAGCAACG GGGCCTACCG GGAGCCACCT AAACCCTCTG 
951 ATGCCTGTGG CCTGATCCGG TGGCTGTTGA TGGTGAACCC CACCCGCCGG 
1001 GCCACCCTGG AGGATGTGGC CAGTCACTGG TGGGTCAACT GGGGCTACGC 
1051 CACCCGAGTG GGAGAGCAGG AGGCTCCGCA TGAGGGTGGG CACCCTGGCA 
1101 GTGACTCTGC CCGCGCCTCC ATGGCTGACT GGCTCCGGCG TTCCTCCCGC 
1151 CCCCTCCTGG AGAATGGGGC CAAGGTGTGC AGCTTCTTCA AGCAGCATGC 
1201 ACCTGGTGGG GGAAGCACCA CCCCTGGCCT GGAGCGCCAG CATTCGCTCA 
1251 AGAAGTCCCG CAAGGAGAAT GACATGGCCC AGTCTCTCCA CAGTGACACG 
1301 GCTGATGACA CTGCCCATCG CCCTGGCAAG AGCAACCTCA AGCTGCCAAA 
1351 GGGCATTCTC AAGAAGAAGG TGTCAGCCTC TGCAGAAGGG GTACAGGAGG 
1401 ACCCTCCGGA GCTCAGCCCA ATCCCTGCGA GCCCAGGGCA GGCTGCCCCG 
14 51 CTGCTCCCCA AGAAGGGCAT TCTCAAGAAG CCCCGACAGC GCGAGTCTGG 
1501 CTACTACTCC TCTCCCGAGC CCAGTGAATC TGGGGAGCTC TTGGACGCAG 
1551 GCGACGTGTT TGTGAGTGGG GATCCCAAGG AGCAGAAGCC TCCGCAAGCT 
1601 TCAGGGCTGC TCCTCCATCG CAAAGGCATC CTCAAACTCA ATGGCAAGTT 
1651 CTCCCAGACA GCCTTGGAGC TCGCGGCCCC CACCACCTTC GGCTCCCTGG 
1701 ATGAACTCGC CCCACCTCGC CCCCTGGCCC GGGCCAGCCG ACCCTCAGGG 
1751 GCTGTGAGCG AGGACAGCAT CCTGTCCTCT GAGTCCTTTG ACCAGCTGGA 
1801 CTTGCCTGAA CGGCTCCCAG AGCCCCCACT GCGGGGCTGT GTGTCTGTGG 
1851 ACAACCTCAC GGGGCTTGAG GAGCCCCCCT CAGAGGGCCC TGGAAGCTGC 
1901 CTGAGGCGCT GGCGGCAGGA TCCTTTGGGG GACAGCTGCT TTTCCCTGAC 
1951 AGACTGCCAG GAGGTGACAG CGACCTACCG ACAGGCACTG AGGGTCTGCT 
2001 CAAAGCTCAC CTGAGTGGAG TAGGCATTGC CCCAGCCCGG TCAGGCTCTC 
2051 AGATGCAGCT GGTTGCACCC CGAGGGGAGA TGCCTTCTCC CCCACCTCCC 
2101 AGGACCTGCA TCCCAGCTCA GAAGGCTGAG AGGGTTTGCA GTGGAGCCCT 
2151 GAGCAGGGCT GGATATGGGA AGTAGGCAAA TGAAATGCGC CAAGGGTTCA 
2201 GTGTCTGTCT TCAGCCCTGC TGAACGAAGA GGATACTAAA GAGAGGGGAA 
2251 CGGGAATGCC CGCGACAGAG TCCACATTGC CTGTTTCTTG TGTACATGGG 
2301 GGGGCCACAG AGACCTGGAA AGAGAACTCT CCCAGGGCCC ATCTCCTGCA 
23 51 TCCCATGAAT ACTCTGTACA CATGGTGCCT TCTAAGGACA GCTCCTTCCC 
2401 TACTCATTCC CTGCCCAAGT GGGGCCAGAC CTCTTTACAC ACACATTCCC 
2451 GTTCCTACCA ACCACCAGAA CTGGATGGTG GCACCCCTAA TGTGCATGAG 
2501 GCATCCTGGG AATGGTCTGG AGTAACGCTT CGTTATTTTT ATTTTTATTT 
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2551 TTATTTATTT ATTTATTTTT TTGAGACGGA GTTTCGCTCT TGGTGCCCAG 
2601 GCTAGAGTGC AATGGCGCGA TCTCAGCTCA CCTCAACCTC CGCCTCCCGG 
2651 GTTCAAGCGA TTCTCCTGCC TCAGCCTCCC TAGTAGCTGG GATTACAGGC 
2701 GCCCGCCACC ATGCCCGGCT AATTTTGTAT TTTTAGTAGA GACAGGGTTT 
2751 CTCCATGTTG GTCAGGCTGG TCTCAAACTC CCGACCTCAG GTGATCCACC 
2801 CACCTCGGCC TCCCAAAGTG CTGGGATTAC AGGCGTGAGC CACCGCGCCC 
2851 CACCTAACCC TTCCTTATTT AGCCTAGGAG TAAGAGAACA CAATCTCTGT 
2901 TTCTTCAATG GTTCTCTTCC CTTTTCCATC CTCCAAACCT GGCCTGAGCC 
2951 TCCTGAAGTT GCTGCTGTGA ATCTGAAAGA CTTGAAAAGC CTCCGCCTGC 
3001 TGTGTGGACT TCATCTCAAG GGGCCCAGCC TCCTCTGGAC TCCACCTTGG 
3051 ACCTCAGTGA CTCAGAACTT CTGCCTCTAA GCTGCTCTAA AGTCCAGACT 
3101 ATGGATGTGT TCTCTAGGCC TTCAGGACTC TAGAATGTCC ATATTTATTT 
3151 TTATGTTCTT GGCTTTGTGT TTTAGGAAAA GTGAATCTTG CTGTTTTCAA 
3201 TAATGTGAAT GCTATGTTCT GGGAAAATCC ACTATGACAT CTAAGTTTTG 
3251 TGTACAGAGA GATATTTTTG CAACTATTTC CACCTCCTCC CACAACCCCC 
3301 CACACTCCAC TCCACACTCT TGAGTCTCTT TACCTAATGG TCTCTACCTA 
3351 ATGGACCTCC GTGGCCAAAA AGTACCATTA AAACCAGAAA GGTGATTGGA 
3401 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



98202387: 

C-TAK1 protein kinase phosphorylates human Cdc25C on serine 216 and 
promotes 14-3-3 

protein binding. 



Peptide information for frame 2 



ORF from 128 bp to 2011 bp; peptide length: 628 
Category: strong similarity to known protein 



1 MESLVFARRS GPTPSAAELA RPLAEGLIKS PKPLMKKQAV KRHHHKHNLR 

51 HRYEFLETLG KGTYGKVKKA RESSGRLVAI KSIRKDKIKD EQDLMHIRRE 

101 IEIMSSLNHP HIIAIHEVFE NSSKIVIVME YASRGDLYDY ISERQQLSER 

151 EARHFFRQIV SAVHYCHQNR VVHRDLKLEN ILLDANGNIK IADFGLSNLY 

201 HQGKFLQTFC GSPLYASPEI VNGKPYTGPE VDSWSLGVLL YILVHGTMPF 

251 DGHDHKILVK QISNGAYREP PKPSDACGLI RWLLMVNPTR RATLEDVASH 

301 WWVNWGYATR VGEQEAPHEG GHPGSDSARA SMADWLRRSS RPLLENGAKV 

351 CSFFKQHAPG GGSTTPGLER QHSLKKSRKE NDMAQSLHSD TADDTAHRPG 

401 KSNLKLPKGI LKKKVSASAE GVQEDPPELS PIPASPGQAA PLLPKKGILK 

451 KPRQRESGYY SSPEPSESGE LLDAGDVFVS GDPKEQKPPQ ASGLLLHRKG 

501 ILKLNGKFSQ TALELAAPTT FGSLDELAPP RPLARASRPS GAVSEDSILS 

551 SESFDQLOLP ERLPEPPLRG CVSVDNLTGL EEPPSEGPGS CLRRWRQDPL 

601 GDSCFSLTDC QEVTATYRQA LRVCSKLT 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7 j3, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7j3; frame 2 



Report for DKF2phtes3_7 j3 .2 



(LENGTH] 628 

[MW] 69612.39 

(pll 9.01 

{HOMOLJ TR£HBL:AB011109_1 gene: "KIAA0537"; product: "KIAA0537 protein"; Homo sapiens 
mRNA for KIAA0537 protein, complete cds . le-152 

{ FUNCAT ] 01.05.04 regulation of carbohydrate utilization (S. cerevisiae, YDR477w) 

5e-66 

[FUNCAT] 11.01 stress response IS. cerevisiae, YDR477w] 5e-66 
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{ FUNCAT 1 
[ FUNCAT) 
[FUNCAT] 
[FUNCAT] 
8e-52 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
( FUNCAT] 
terminal domain] 2e-26 



30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 5e-66 

98 classification not yet clear-cut [S. cerevisiae, YLR096w] 6e-54 
30.02 organization of plasma membrane (S. cerevisiae, YLR096w] 6e-54 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507c] 

03.25 cytokinesis [S. cerevisiae, YDR507c] 8e-52 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 9e-51 
30.10 nuclear organization [S. cerevisiae, YKLlOlw] 9e-51 

99 unclassified proteins [S. cerevisiae, YPL141c] le-45 
10.99 other signal-transduction activities (S. cerevisiae, YPL153c] 6e-44 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 6e-44 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPL153c) 6e-44 

03.19 recombination and dna repair [S. cerevisiae, YPL153c) 6e-44 
03.16 dna synthesis and replication (S. cerevisiae, YMROOlc] 2e-42 
10.02.11 key kinases [S. cerevisiae, YBL105c] 3e-34 

04.05.01.04 transcriptional control [S. cerevisiae, YKLl39w CTK1 - carboxy- 



[ FUNCAT } 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YPL031C] 5e-24 
[FUNCAT] 
5e-24 
[ FUNCAT } 

IS. 

(FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
6e-21 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YNL183C] le-17 
[FUNCAT] 
le-17 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
le-15 
[FUNCAT] 
5e-15 
[FUNCAT] 
[FUNCAT] 
YBR097w] 2e-08 
[FUNCAT] 
2e-08 
[FUNCAT] 
2e-08 
[ FUNCAT ] 
[FUNCAT] 
8e-05 
[FUNCAT] 
cerevisiae 
[ BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP) 
[SCOP] 
[SCOP) 
[SCOP] 
[SCOP] 
(SCOP) 
(SCOP) 
[EC] 
[EC] 



03.01 cell, growth [S. cerevisiae, YFR014C] 4e-28 

03.10 sporulation and germination [S. cerevisiae, YGL180w] 2e-26 
06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGL180w] 2e-26 
08.13 vacuolar transport [S. cerevisiae, YGLl80w] 2e-26 

04.99 other transcription activities (S. cerevisiae, YER129w] 
02.19 metabolism of energy reserves (glycogen, trehalose) [S 

01.04.04 regulation of phosphate utilization 

mating-type determination. 



03.07 pheromone response 
cerevisiae, YHL007c] 6e-24 

10.05.11 key kinases [S. cerevisiae, YHL007c] 6e-24 
09.01 biogenesis of cell wall [S. cerevisiae, YNR031c] 

10.03.11 key kinases [S. cerevisiae, YNR031c] le-22 
03.13 meiosis [S. cerevisiae, YDR523c] 8e-22 
04.05.01.01 general transcription activities 



4e-26 
cerevisiae, 



(S. cerevisiae, YPL031c] 
sex-specific proteins 



le-22 



(S. cerevisiae, YDL108w] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YFL033c) 6e-21 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 7e-19 
10.04.11 key kinases [S. cerevisiae, YDL159w] 3e-18 

01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 

08.99 other intracellular-transport activities [S. cerevisiae, YNL183c] 

05.07 translational control [S. cerevisiae, YDR283c] 2e-17 

09.04 biogenesis of cytoskeleton [S. cerevisiae, YNL020c] 4e-16 

04.03.99 other trna-transcription activities [S. cerevisiae, YOR061w] 



10,04.99 other nutritional-response activities 



(S. cerevisiae, YJR059w) 



c energy conversion [M. genitalium, MG109] 3e-12 
30.09 organization of intracellular transport vesicles 

08.07 vesicular transport (golgi network, etc.) (S. 

06.04 protein targeting, sorting and translocation (S. 



[S. cerevisiae, 



cerevisiae, 
cerevisiae, 



YBR097w] 
YBR097W) 



30.08 organization of golgi [S. cerevisiae, YBR097w] 2e-08 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 



10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
8e-05 

BL00479C Phorbol esters / diacylglycerol binding domain proteins 
BL00239B Receptor tyrosine kinase class II proteins 
BL00107A Protein kinases ATP-binding region proteins 

dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus le-77 

5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) 4e-68 
5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 2e-85 
5.1.1.1.6 Twitchin, kinase domain [California sea har le-80 
5.1.1.1.5 gamma-subunit of glycogen phosphorylase kinas 2e-7 6 
5.1.1.2.4 insulin receptor (Human (Homo sapiens) le-69 
5.1.1.1.4 c AMP -dependent PK, catalytic subunit (mouse (Mu le-84 
5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn le-68 
5.1.1.1.3 cAMP -dependent PK, catalytic subunit [bovine (Bo 9e-85 
5.1.1.2.2 (168-437) c-src tyrosine kinase (human (Horn le-69 

.1.2 cAMP-dependent PK, catalytic subunit (pig (Su le-85 
.2.1 (167-437) Haemopoetic cell kinase Hck (huma 5e-66 

dlcsn 5.1.1.1.11 Casein kinase-1, CKl [Schizosaccharomyces pombe 9e-47 

dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) le-75 
dlckja_ 5.1.1.1.10 Casein kinase-1, CKl (rat (Rattus norvegicus) 5e-54 
2.7.1.38 Phosphorylase kinase le-36 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase 4e-40 



01.06 
YHR079cl 



dlwfc_ 
dlkoa_2 
dlkoba_ 
dlphk_ 

dlirk 

dlapme_ 
dlfgka_ 
dlydre_ 
dlfmk 3 
dlcdka 



_ 5.1.1 
d2hcka3 5.1.1 
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[EC] 

I EC] 

[EC) 

[EC] 

(PIRKW) 

[PIRKW] 

[ PIRKW) 

[PIRKW] 

[PIRKW] 

(PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW) 

[PIRKW] 

( PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

( PIRKW] 

( PIRKW] 

[PIRKW] 

(PIRKW) 

{PIRKW] 

(PIRKW] 

( PIRKW} 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM1 

(SUPFAM] 

(SUPFAM] 

(SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[SUPFAM] 

(SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

(SUPFAM] 

(SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[SUPFAM) 

(SUPFAM] 

[SUPFAM] 

[SUPFAM] 

(PROSITE] 

[PROSITE] 

[PROSITE) 

[PROSITE] 

[PROSITE] 

[ PROSITE) 

( PROSITE) 



2.7.1.128 [Acetyl-CoA carboxylase] kinase le-61 
2.7.1.117 Myosin-light-chain kinase 2e-40 

2.7.1.109 (Hydroxymethylglutaryl-CoA reductase (NADPH) ) kinase le-61 

2.7.1.37 Protein kinase 7e-42 

phosphotransferase 6e-66 

nucleus le-64 

calcium 7e-35 

duplication le-38 

tandem repeat 4e-39 

phorbol ester binding le-38 

zinc le-38 

cell cycle control le-42 

serine/threonine-specific protein kinase 8e-68 

oncogene le-40 

phospholipid binding le-38 

autophosphorylation le-64 

brain le-40 

heterotetramer 2e-36 

mitosis 7e-42 

polymer le-35 

magnesium 6e-66 

ATP 8e-68 

polyprotein le-40 

phosphoprotein le-64 

apoptosis 4e-39 

glycoprotein 7e-42 

leucine zipper 3e-35 

skeletal muscle 7e-35 

protein kinase 5e-41 

CAMP binding 3e-38 

testis 9e-36 

purine nucleotide binding 2e-49 

calcium binding 8e-39 

alternative splicing 3e-37 

P-loop 2e-49 

lipoprotein 2e-33 

segmentation le-33 

core protein le-40 

muscle 7e-35 

myristylation 2e-33 

EF hand 8e-39 

cell division 2e-40 

calmodulin binding 4e-4 0 

ribosomal protein S6 kinase II 5e-36 

fibronectin type III repeat homology 3e-33 

immunoglobulin homology 3e-33 

calcium-dependent protein kinase 8e-39 

AMP-activated protein kinase 6e-66 

protein kinase akt 3e-42 

protein kinase SPK1 le-42 

unassigned Ser/Thr or Tyr-specific protein kinases 8e-68 
Ca2+/calmodulin-dependent protein kinase 3e-37 
calmodulin repeat homology 8e-39 

cAMP receptor protein cyclic nucleotide-binding domain homology 6e-33 
protein kinase C zeta le-36 

Dictyostelium cAMP-dependent protein kinase catalytic chain le-34 
death-associated protein kinase 4e-39 
pleckstrin repeat homology 3e-42 
ankyrin repeat homology 4e-39 
protein kinase homology 8e-68 

Ca2+/calmodulin-dependent protein kinase II 8e-41 
protein kinase C zinc-binding repeat homology le-38 
twitchin 3e-33 

protein kinase C delta le-38 
cGMP-dependent protein kinase 6e-33 
protein kinase cdrl 7e-42 
protein kinase C C2 region homology 3e-37 
protein kinase C alpha 3e-37 
yeast protein kinase C 5e-36 
kinase-related transforming protein le-41 
kinase interaction domain homology le-42 
gag-akt polyprotein le-40 

Ca2+/calmodulin-dependent protein kinase I 4e-40 

protein kinase C mu 4e-33 

PROTEIN KINASE_ATP 2 

RGD "l 

MYRISTYL 4 

CAMP_PHOSPHO_SITE 3 

CK2_PHOSPHO_SITE 13 

TYR_PHOSPHO_SITE 2 

PKC PHOSPHORITE 12 
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(PROSITE] ASN GLYCOSYLATION 2 

[PROSITE] PROTE I N_K I NASE_ST 1 

[PFAM] Eukaryotic protein kinase domain 

[KW] AU_Alpha 

[KW] 3D " 

(KW) LOWJTOMPLEXITY 10.51 % 

SEQ MESLVFARRSGPTPSAAELARPLAEGLIKSPKPLMKKQAVKRHHHKHNLRHRYEFLETLG 

SEG xxxxxxxxxxxx 

ICtpE HHHHHHHHHHHHHHHCCCCCCCC — GGGEEEEEEEE 



SEQ KGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHIRREIEIMSSLNHPHIIAIHEVFE 

SEG 

IctpE CTTTEEEEEEEETTTEEEEEEEEEHHHHHHHCCHHHHHHHHHHHHCCCTTTBCCEEEEEE 

SEQ NSSKIVIVMEYASRGDLYDYISERQQLSEREARHFFRQIVSAVHYCHQNRVVHRDLKLEN 

SEG 

IctpE ETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHHHCCEECCCCCGGG 

SEQ ILLDANGNIKIADFGLSNLYHQGKFLQTFCGSPLYASPEIVNGKPYTGPEVDSWSLGVLL 

SEG 

IctpE EEETTTTCEEECCTTTTEET-TTT-BCCCCCCGGGCCHHHHHCCCBC-HHHHHHHHHHHH 

SEQ YILVHGTMPFDGHDHKILVKQISNGAYREPPKPSDACGLIRWLLMVNPTRRATLEDVASH 

SEG 

IctpE HHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTCHHHHHHHHHTTTTTGGGTTTHHHHHHC 

SEQ WWVNWGYATRVGEQEAPHEGGHPGSDSARASMADWLRRSSRPLLENGAKVCSFFKQHAPG 

SEG 

IctpE GG 

SEQ GGSTTPGLERQHSLKKSRKENDMAQSLHSDTADDTAHRPGKSNLKLPKGILKKKVSASAE 

SEG 

IctpE 



SEQ GVQEDPPELSPIPASPGQAAPLLPKKGILKKPRQRESGYYSSPEPSESGELLDAGDVFVS 

SEG xxxxxxxxxxxx . . . xxxxxxxxxxxxxxx 

IctpE 

SEQ GDPKEQKPPQASGLLLHRKGILKLNGKFSQTALELAAPTTFGSLDELAPPRPLARASRPS 

SEG xxxxxxxxxxxxxx 

IctpE 

SEQ GAVSEDSILSSESFDQLDLPERLPEPPLRGCVSVDNLTGLEEPPSEGPGSCLRRWRQDPL 

SEG xxxxxxxxxxxxx 



IctpE 

SEQ GDSCFSLTDCQEVTATYRQALRVCSKLT 

SEG 

IctpE 



Prosite for DKFZphtes3_7 j3 .2 



PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



121- >125 
576->580 
290->294 
337->341 
413->417 

30->33 
74->77 
82->85 

122- >125 
142->145 
148->151 
289->292 
327->330 
339->342 
373->376 
377->380 
616->619 

15->19 
133->137 
148->152 
227->231 
293->297 
331->335 
377->381 
391->395 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP PHOSPHO SITE 
CAMP~PHOS PHOJS I TE 
CAMP_PHOSPHO_SITE 
PKC PHOSPHO SITE 
PKC^PHOSPHO^SITE 
PKC PHOSPHO~SITE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO SITE 
PKC PHOSPHORITE 
PKC'PHOSPHoRlTE 
PKC~PHOS PHO~S I TE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2 PHOSPHO_SITE 
CK2"PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO SITE 
CK2 PHOSPHORITE 
CK2~PHOSPHO~SITE 
CK2~PHOSPHO~SITE 
CK2 PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00Q04 
PDOC000O4 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00006 461->465 CK2_PHOSPHO SITE PDOC00006 

PS00006 511->S15 CK2_PHOSPHO~SITE PDOC00006 

PS00006 523->527 CK2 PHOSPHO_SITE PDOC00006 

PS00006 576->582 CK2~PHOSPHO_SITE PDOC00006 

PS00006 606->610 CK2~PHOSPHO_SITE PDOC00006 

PS00007 453->460 TYR~PHOSPHO_SITE PDOC00007 

PS00007 4 53->4 61 TYR~PHOSPHO_SITE PDOC00007 

PS00008 320->326 MYRISTYL PDOC00008 

PS00008 324->330 MYRISTYL PDOC00008 

PS00008 347->353 MYRISTYL PDOC00008 

PS00008 360->366 MYRISTYL PDOC00008 

PS00016 134->137 RGD PDOC00016 

PS00107 59->82 PROTEINJ<INASE_ATP PDOC00100 

PS00107 59->86 PROTEIN KINASE_ATP PDOC00100 

PS00108 171->184 PROTEIN - KINASE ST PDOC00100 



Pfam for DKFZphtes3_7j3 .2 



HMM_NAME 
HMM 
Query 
HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 



53 



♦YeigRilGeGsFGtVYkCiWrTGelVAIKIIkkrsms FIRE I 

YE+++++G+G++G+V+K+++ +G++VAIK I+K++++ ++REI 
YEFLETLGKGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHIRREI 



qlMRrLnHPNI IRFYDwFedddDHI YMIMEYMeGGDLFDYI rrngpMsEw 
+ IM +LNHP+II + ++FE ++ I ++MEY+ GDL+DYI+++ ++SE+ 
102 EIMSSLNHPHIIAIHEVFE-NSSKIVIVMEYASRGDLYDYISERQQLSER 

elrf IMyQILrGMeYLHSMgllHRDLKPENILIDeNgqIKIcDFGLARqM 
E+R++++QI++++ y+H ++++HRDLK ENIL+D NG+IKI+DFGL+ ++ 
151 EARHFFRQIVSAVHYCHQNRWHRDLKLENILLDANGNIKIADFGLSNLY 



LnHPWF* 
H W+ 
298 ASHWWV 



101 



150 



200 



nnYerMttfCGTPWYMMAPEVIImg . nyYttkVDMWSFGCILWEMMTGep 
+ + ++ TFCG+P Y +PE+ ++G +Y +++VD WS+G++L++++ G+ 
201 HQGKFLQTFCGSPLYA-SPEI-VNGKPYTGPEVDSWSLGVLLYILVHGTM 248 

PFyddnMemlmrliqrf rrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
p F+++ ++ i + +++ +p S+ + ++RW++ ++P++R T +++ 
249 PFDGHDHKILVKQISNGAYREPPKPSD-ACGLIRWLLMVNPTRRATLEDV 297 



303 



947 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3J7j8 



group: testes derived 

DKFZphtes3_7j8 encodes a novel 410 amino acid protein nearly identical to human 
WUGSC:H_DJ1 159004 .1. 

The novel protein contains an additional C-terminal domain, which is not present in 

WUGSC:H DJ1 159004. 1. mMr% 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

WUGSC:H_DJ1 15 9004.1 similarity to YBLl04p 

verifies and extends the genmodel WUGSC:H_DJ1159O04 . 1 
similarity to S.cerevisiae YBL104p 

Sequenced by BMFZ 

Locus: /map="7p21-p22" 

Insert length: 3353 bp 

Poly A stretch at pos . 3231, no polyadenylation signal found 

1 GCAAAATATG TTGTATTTGT GGCATAGTTC ATATTTACAC TATCATAAAA 
51 TTATGGCCGA GAAGTTAAAT ATTCTAAATG TGTCAACATA GTTCTCTGTA 
101 AAACTGACTT ATTTTCCAAA TATATTTTGA AATAAAACAA TATAAAAATG 
151 TTTTCTGTTT TTAGGAATGG TGGAAAGCAG CAGACATAAT TGGAGTGGGT 
201 TGGATAAGCA AAGTGATATT CAAAATTTAA ATGAAGAGAG AATCTTAGCT 
251 TTACAGCTTT GTGGGTGGAT AAAGAAAGGA ACGGATGTAG ACGTGGGGCC 
301 ATTTTTGAAC TCCCTTGTAC AAGAAGGGGA ATGGGAAAGA GCTGCTGCTG 
351 TGGCATTGTT CAACTTGGAT ATTCGCCGAG CAATCCAAAT CCTGAATGAA 
401 GGGGCATCTT CTGAAAAAGG AGATCTGAAT CTCAATGTGG TAGCAATGGC 
4 51 TTTATCGGGT TATACGGATG AGAAGAACTC CCTTTGGAGA GAAATGTGTA 
501 GCACACTGCG ATTACAGCTA AATAACCCGT ATTTGTGTGT CATGTTTGCA 
551 TTTCTGACAA GTGAAACAGG ATCTTACGAT GGAGTTTTGT ATGAAAACAA 
601 AGTTGCAGTA CGTGACAGAG TGGCATTTGC TTGTAAATTC CTTAGTGATA 
651 CTCAGTTAAA TAGATACATC GAAAAGTTGA CCAATGAAAT GAAAGAGGCT 
701 GGAAATTTGG AAGGAATTTT GCTTACAGGC CTTACTAAAG ATGGAGTGGA 

7 51 CTTAATGGAG AGTTATGTTG ATAGAACTGG AGATGTTCAA ACAGCAAGTT 
801 ACTGTATGTT ACAGGGTTCA CCTTTAGATG TTCTTAAAGA TGAAAGGGTT 

8 51 CAGTACTGGA TTGAGAATTA TAGAAATTTA TTAGATGCCT GGAGGTTTTG 
901 GCATAAACGA GCTGAATTTG ATATTCACAG GAGTAAGTTG GATCCCAGTT 
951 CCAAGCCTTT AGCACAAGTT TTTGTGAGTT GCAATTTCTG TGGCAAGTCA 

1001 ATCTCCTACA GCTGTTCAGC TGTGCCTCAT CAGGGCAGAG GTTTTAGTCA 
1051 GTATGGTGTG AGTGGCTCAC CAACGAAATC TAAAGTCACA AGTTGTCCTG 
1101 GCTGTCGAAA ACCACTTCCT CGATGTGCGC TTTGTCTCAT TAATATGGGA 
1151 ACACCAGTTT CTAGCTGTCC TGGAGGAACC AAATCAGATG AAAAAGTGGA 
1201 CTTGAGCAAG GACAAAAAAT TAGCCCAATT TAACAACTGG TTTACATGGT 
1251 GTCATAATTG CAGGCACGGT GGACATGCTG GACATATGCT TAGTTGGTTC 
1301 AGGGACCATG CAGAGTGCCC TGTGTCTGCA TGCACGTGTA AATGTATGCA 
1351 GTTGGATACA ACGGGGAATC TGGTACCTGC AGAGACTGTC CAGCCATAAA 
1401 ATGTTACCAC CTTAAGAGAA CCCTTCAAGT GTGGAGCTTT CTAGTAGGTG 
14 51 TCCTTCATAG CTCAGAAACA TACCTCAGAA CAAGCCATTC ATGACTTACC 
1501 TGTAATGGGA AAATAAATCA TTCTATCAGA TCAGCAGTTT TGATGTTTGA 
1551 GTGATTTTGA TATGCTTCAC AGAGACAAAT GCTGCCAAAA TAAACATCGA 
1601 AGTATAGACA TGAGTTCTGT TCAGCAGGTT GAAAAGTGTG ATTTAGAAAA 
1651 ACTTTCTAAG TTTTGGTTGA AATTATGAAC ACTCTAGAAG CAGAATTTCT 
1701 GGAAGAGCCA AGAACAGACT TTGAGCCTAT ATCTTCAAAG CTGAAACTGG 
1751 ATATCTTTCA ATAAAATATG TGCACTTTTA AAATAAAATG ACTAATTCTG 
1801 TGATTCAGAC AATAGTTTTA AGTTCAGCTG TGCTTAGATT TCTTTCAGAT 
18 51 TAATTTAAAA TTATAGATTT TTACTTTTAG AATTGCAGAG CCCCTATCCC 
1901 ACACTGGAGA ATATTTTTTA TTACTGTCTG TTATATATGT GTCTATGTGT 
1951 GTGTGTATAT TTATGTGTGT ATGTATAAAT ATGTACTTTT TAAAGGAGCC 
2001 TTTTCCCTCC TTTGATTTTA AGATAAGCAA TCTTTTGGCA TAACATTATC 
2051 GTCTTCCTAG AAAAGCCAAG ATGAAGAATC TATCTTACAA CTTTTTCTCT 
2101 TCAGTAGAGA AAAACATGTA CCATTTCAGG TGAACATACA AAATTTTCAC 
2151 TTTCTACCTT TTGCCTTCCA ATGTCCTGAT TTGTCTTCAA AGGTTTTTCT 
2201 CCATATTAAT TTGTCATCTT ATCCTCATCA CCTGAGAACA TTTTACTGCA 
2251 TACAAAGTCT ATGCAAGATT ATATGTAACT AGCCATTTAG TATAATCTAT 
2301 GTCAGTGTTT CTGTGCTGTC AAATTCCGTC CTGATTTGGA ATACCATACC 
2351 TTGTTCTTTC CAAGGTAGAC TAGGAAGTGT TGGGGAAATA GGGTCACTTC 
2401 AGAGACCATT TTAGATGTAA GTTTTTAAAT GTAAGTGTTA CTGGGGCTAA 
2451 GTCAGGGACT TTATTTAAAA CATTTTTTTT TTCTCATTTC ATAGCTAGAT 
2501 AGTTGTAAGA GAAATACAAA GAATTTACAA GATGCTTCTC TGTCATCTGC 
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2551 CGTATGCAGA GGGACTGAAC TAGGAATTTT GTAGTTGAAG CTGTGTTCAT 
2601 AAAGAGTAAA TCTTATTTTA TAGATTTTGG AGAAATAAAA CAAGAATTTT 
2651 AAGAGCTTTC GTATTAGCAG TTTTGCCTTA TAAAAACTAA' GATTTGTCAG 
2701 ATTAGTTTGA GGTGTAACCT AAATATTAAA AGTAGATTAA ATTTATTTTT 
2751 TACCTTGAGT GTCTGATACA TAAAACCCTT TTCTAGGAAA ACATTGGAAG 
2801 TAGTACATAT TTACTCTAAA TGTCTCACCT GCATGACAGT CTTTTCAAAT 
2851 GAAAGACATG GTAATTGCAA TTTTTTTTTA AAGATTGCTA TTAAGGGTAC 
2901 TTTTTCCAGC CTTCATTTGA GTAAATCTTA ATTGATTTCA TTTTATTAAC 
2951 ATATACCCTT TACCTTTAAT ATTTCATTTG AAGTGTTCCT TTCAAACTTA 
3001 CTGTCTTAAA TATGAAAGTC AGCTTTAAGT AATGTCAGAC TCATATGCAT 
3051 TTTCATTCTC ATTAGCTAAA GTAAAATGTA AAATTATCTC AAATAGTTAC 
3101 AAGTTTTGGA AATACAGTAT AAAACATGAA TGTAAAGTCT ATTATGTAAT 
3151 ATGCTTATTT GTAATCCTAA TATATGAGGG TGACATTTTT AAGATTGTAT 
3201 GTATGTGTCA ACCTCTTAAA TGTTTTCTGT GAAAAAAAAA AAAAAAAAAA 
3251 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAA 



BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 167 bp to 1396 bp; peptide length: 410 
Category: known protein 
Classification: unclassified 



1 MVESSRHNWS GLDKQSDIQN LNEERILALQ LCGWIKKGTD 
51 VQEGEWERAA AVALFNLDIR RAIQILNEGA SSEKGDLNLN 
101 DEKNSLWREM CSTLRLQLNN PYLCVMFAFL TSETGSYDGV 
151 RVAFACKFLS DTQLNRYIEK LTNEMKEAGN LEGILLTGLT 
201 VDRTGDVQTA SYCMLQGSPL DVLKDERVQY WIENYRNLLD 
251 FDIHRSKLDP SSKPLAQVFV SCNFCGKSIS YSCSAVPHQG 
301 SPTKSKVTSC PGCRKPLPRC ALCLINMGTP VSSCPGGTKS 
351 KLAQFNNWFT WCHNCRHGGH AGHMLSWFRD HAECPVSACT 
401 NLVPAETVQP 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7j8, frame 2 

PIR:S45391 probable membrane protein YBL104C - yeast (Saccharorayces 
cerevisiae), N = 2, Score - 446, P - 4.5e-47 

TREMBL:AC00 4982_1 gene: "WUGSC :H_DJ1159O04 . 1" ; Homo sapiens PAC clone 

DJ1159O04 from 7p21-p22, complete sequence., N = 1, Score = 2038, P = 
7.6e-211 



VDVGPFLNSL 
WAMALSGYT 
LYENKVAVRD 
KDGVDLMESY 
AWRFWHKRAE 
RGFSQYGVSG 
DEKVDLSKDK 
CKCMQLDTTG 



>TREMBL:AC004982_1 gene: "WUGSC : H_DJ1159O04 . 1 M ; Homo sapiens PAC clone 
DJ1159O04 from 7p21-p22, complete sequence. 
Length - 379 

HS?s: 

Score = 2038 (305.3 bits). Expect - 7.6e-211, P - 7.6e-211 
Identities = 379/379 (100%), Positives - 379/379 (100%) 

Query: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 
Sbjct: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

Query: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 

AVALFNLDIRRAIQILNEGASSEKGDLNLNWAMALSGYTDEKNSLWREMCSTLRLQLNN 
Sbjct: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNWAMALSGYTDEKNSLWREMCSTLRLQLNN 120 
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Query : 


121 


Sb j ct : 


121 


Query : 


181 


CK-; r +- • 

aoj c l. . 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 



PYLCVMF AFLT5ETG5 YLHjVijl&NiWAVK.UKVttr a*-r\r LauiyLwi\ii&i\iiiwLni\E^uii a» v 
PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 

LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 240 
LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRMLLD 
LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 240 

AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 300 
AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 
AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 300 

SPTKSKVTSCPGCRKPLPRCALCLINMGT PVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 
SPTKSKVTSCPGCRKPLPRCALCLINMGT PVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 

WCHNCRHGGHAGHMLSWFR 379 
WCHNCRHGGHAGHMLSWFR 
WCHNCRHGGHAGHMLSWFR 37 9 

Pedant information for DKFZphtes3_7j8, frame 2 

Report for DKFZphtes3_7 j8 . 2 

[LENGTH! 410 

[MW] 45862.45 

Ipl] 6.51 

[ HOMOL ) TREMBL:AC004982_1 gene: "WUGSC : H_DJ1 159004 . 1" ; Homo sapiens PAC clone DJ1159O04 

from 7p21-p22, complete sequence. 0.0 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YBL104c] 7e-48 

(BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

(BLOCKS] BL00534A Ferrochelatase proteins 

[PIRKW] transmembrane protein 2e-46 

[KW] All_Alpha 

SEQ MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 
PRD cccccccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccccchhhhh 

SEQ AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 
PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhccc 

SEQ PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
PRD ccccceeeccccccccccceeeccchhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcc 

SEQ LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 
PRD cceeeeeeccccchhhhhhhhcccccceeeeeccccccccccchhhhhhhhhhhhhhhhh 

SEQ AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 
PRD hhhhhhhhhhhhhhcccccccccceeeeeeeccccccccccccccccccccccccccccc 

SEQ SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
PRD ccccccccccccccccccceeeeecccccccccccccccccceeeehhhhhhhhhcceee 

SEQ WCHNCRHGGHAGHMLSWFRDHAECPVSACTCKCMQLDTTGNLVPAETVQP 
PRD eecccccccccchhhhhhhhhccccccccccccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_7 j8 . 2) 
(No Pfam data available for DKFZphtes3J7 j8 .2) 
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DKFZphtes3J7pl0 



group: Cell Cycle 

DKFZphtes3 IplO.l encodes a novel 422 amino acid putative protein, which is closely related to 
the Xenopus laevis XPMC2 protein. 

In fission yeast the kinases Weel and Mikl control that initiation of mitosis starts after 
completion of DNA synthesis. Yeast in which both Weel and Mikl kinases are defective exhibit a 
mitotic catastrophe phenotype. XPMC2 of xenopus rescues several different yeast mitotic 
catastrophe mutants defective in Weel/Mikl kinase function. The XPMC2 protein is localised in 
the nucleus in Xenopus oocytes. The new protein is the human orthologue of this gene. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to XPMC2 protein 
complete cDMA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map~"9q34 H 
Insert length: 2380 bp 

Poly A stretch at pos. 2341, polyadenylation signal at pos . 2318 



1 AGCGTGCGTG CTGAGGTATG CGCAACGCGT GCGGGGTCTC TTCCGGAGTC 
51 TTTTCCTGGA CGGGGTCCCT GCGGTGGGTG TGTTTCGGCC TGGCCTGGGC 
101 AGGCGCTTGT GCTGCCAGGG CGCCGGGCCC GGGGAGGCCG GGGTCTCGGG 
151 TGGCCGCCGG CCCAGGCGCT GGACGGCAGC AGGATGGGGA AGGCGAAGGT 
201 CCCCGCCTCC AAGCGCGCCC CGAGCAGCCC CGTGGCTAAG CCGGGTCCTG 
251 TCAAGACGCT CACTCGGAAG AAAAACAAGA AGAAAAAAAG GTTTTGGAAA 
301 AGCAAGGCGC GGGAAGTAAG CAAGAAGCCA GCAAGCGGCC CCGGTGCTGT 
351 GGTGCGACCT CCAAAGGCAC CAGAAGACTT TTCTCAAAAC TGGAAGGCGC 
401 TGCAAGAGTG GCTGCTGAAA CAAAAATCTC AGGCCCCAGA AAAGCCTCTT 
451 GTCATCTCTC AGATGGGTTC CAAAAAGAAG CCCAAAATTA TCCAGCAAAA 
501 CAAAAAAGAG ACCTCGCCTC AAGTGAAGGG AGAGGAGATG CCGGCAGGAA 
551 AAGACCAGGA GGCCAGCAGG GGCTCTGTTC CTTCAGGTTC CAAGATGGAC 
601 AGGAGGGCGC CAGTACCTCG CACCAAGGCC AGTGGAACAG AGCACAATAA 
651 GAAAGGAACC AAGGAAAGGA CAAATGGTGA TATTGTTCCA GAACGAGGGG 
701 ACATCGAGCA TAAGAAGCGG AAAGCTAAGG AGGCAGCCCC AGCCCCACCC 
7 51 ACCGAGGAAG ACATCTGGTT TGACGACGTG GACCCAGCGG ATATCGAAGC 
801 TGCCATAGGT CCAGAGGCGG CCAAGATAGC GAGGAAACAG TTGGGTCAGA 
851 GCGAGGGCAG CGTCAGCCTC AGCCTCGTGA AAGAGCAGGC CTTCGGCGGC 
901 CTGACAAGAG CCTTAGCCTT GGACTGTGAG ATGGTGGGCG TGGGCCCTAA 
951 GGGGGAGGAG AGCATGGCCG CCCGTGTGTC CATCGTGAAC CAGTATGGGA 
1001 AGTGCGTTTA TGACAAGTAC GTCAAACCAA CTGAGCCCGT GACGGACTAT 
1051 AGGACAGCGG TCAGTGGGAT TCGGCCTGAG AACCTCAAGC AGGGAGAAGA 
1101 GCTTGAAGTT GTTCAGAAGG AAGTGGCAGA GATGCTGAAG GGCAGAATTC 
1151 TAGTGGGGCA CGCTCTGCAT AATGACCTAA AGGTACTATT TCTTGATCAT 
1201 CCAAAAAAGA AGATTCGGGA CACACAGAAA TATAAACCTT TCAAGAGTCA 
1251 AGTAAAGAGT GGAAGGCCGT CTCTGAGACT ACTTTCAGAG AAGATCCTTG 
1301 GGCTCCAGGT CCAGCAGGCG GAGCACTGTT CAATTCAGGA TGCCCAGGCA 
1351 GCAATGAGGC TGTACGTCAT GGTGAAGAAG GAGTGGGACA GCATGGCCCG 
1401 AGACAGGCGC CCCCTGCTGA CTGCTCCAGA CCACTGCAGT GACGACGCCT 
14 51 AGCAGTCCTG CCCTGCTGCT GCTGCCGCCC CGCTACAGAG GCAATGTGAC 
1501 CAGTCACAGG GACAGATCAC ATCTCCCCAG AGTGGCAACT CTGGTGAAAC 
1551 CTTTTCAGAA TCATGGCAGA GGGGCGTGGC GTGGTGCTAC TGAGAAGGTC 
1601 CTCCTTCCTC TTGACTTTGT GGTCTGAAAC CTGGTCTTAC TGTCCATGTG 
1651 TGTTTGGGCC CGGATGGTCA GGGTGGGGAG CAGGGACGGC CATGGGCACG 
1701 CCTGGCCACG CTTTACCGAC TGCTGACCCC CTGGGCCAGG TGAGGTTGGG 
17 51 GCCTGTGGGC CGCCAGTCCA TACGGTGCTG TCACTGCCCA TCTTCGGTGA 
1801 CACCCTGGGG TGAGGTGCTC AGCACCTTCC TCTCGAGGAG CCACATTTTC 
1851 CTCCTTTGTG TTAGGGGACA TAACAAGCTC TGCTGGGCTT GAGGGACCCA 
1901 GACCAGGTGT CTGCAGTCAG CTCCTGAGAC ACAGCTGGCC GGCACAACAG 
1951 GTGTTACATC AGGGGTTTCC TGTGGCCGTT TGAACTTTGA GCATTTATCT 
2001 AAATTAAATT GGCCCAGGGT TGGCTGGTGG GTCACCCAGC AGAGGCTTCT 
2051 CCCCATAGCA CGAGGATGTG TTGCCTGGGC ACGGTGACTG CGGTTATTCC 
2101 TGGAGGTCGG CAGACATGCC AACCTTGGGC TATTTGAGCT GGAGAAGCTA 
2151 TGTGATGCTA GCCGGTGGCT TTCTGGGCTA GGCCCCAGTT TGAGGCTCCC 
2201 CTGGGAACTA GAGCCAGGAA CAGCCAGTGG CACTGACAAG GGGACGGAGT 
2251 CCAAGGCGTT ATTGGGCCAC CTGACAGCTG GACAGAAAAG GGGCAGACAC 
2301 ACCGAGGATG CGATTTAAAA TAAATGCAGA TGTTTACTTG GAAAAAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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Entry HSAC2099 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Genomic sequence from Human 9q34; HTGS 
phase 1, 2 unordered pieces. 

Score = 5055, P « 0.0e+00, identities a 1011/1011 
8 exons Bp 104219-116190 



Medline entries 



95157530: 

Cloning and expression of a Xenopus gene that prevents mitotic 
catastrophe in fission yeast. 



Peptide information for frame 1 



ORF from 184 bp to 14 49 bp; peptide length: 422 
Category: strong similarity to known protein 



RAPSSPVAKP GPVKTLTRKK NKKKKRFWKS KAREVSKKPA 
KAPEDFSQNW KALQEWLLKQ KSQAPEKPLV ISQMGSKKKP 
SPQVKGEEMP AGKDQEASRG SVPSGSKMDR RAPVPRTKAS 
ERTNGDIVPE RGDIEHKKRK AKEAAPAPPT EEDIWFDOVD 
EAAKIARKQL GQSEGSVSLS LVKEQAFGGL TRALALDCEM 
MAARVSIVNQ YGKCVYDKYV KPTEPVTDYR TAVSGIRPEN 
QKEVAEMLKG RILVGHALHN DLKVLFLDHP KKKIRDTQKY 
RPSLRLLSEK ILGLQVQQAE HCSIQDAQAA MRLYVMVKKE 
LLTAPDHCSD DA 

B LAS TP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7plO, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7plO, frame 1 



1 MGKAKVPASK 

51 SGPGAVVRPP 

101 KIIQQNKKET 

151 GTEHNKKGTK 

201 PADIEAAIGP 

251 VGVGPKGEES 

301 LKQGEELEVV 

351 KPFKSQVKSG 

401 WESMARDRRP 



Report for DKFZphtes3_7plO . 1 



(LENGTH) 
[MW] 

[pi] 

[HOMOL] 

[ FUNCAT j 

[ FUNCAT ] 

[FUNCAT] 

YGL094C] 7e- 

[ FUNCAT ) 

cerevisiae, 

(FUNCAT] 

(PROSITE] 

[PROSITE] 

[PROSITE) 

(PROSITE) 

[PROSITE) 

[PROSITE) 

[PROSITE) 

[KW] 

tKW] 



422 

46671.91 
9.79 

PIR:S53818 XPMC2 protein - African clawed frog 7e-96 

03.22 cell cycle control and mitosis {S. cerevisiae, YOL080c) 2e-42 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 2e-19 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



13 

04.05.05 mrna processing (5* -end 
YGL094C] 7e-l3 

99 unclassified proteins 
RGD 1 
MYRISTYL 4 
CAMP_PHOSPHO SITE 2 
CK2_PHOSPHO SITE 6 
TYR^PHOSPHO^SITE 2 
GL YCOSAMI NOGL YCAN 1 
PKC_PH0SPHO_SITE 8 
All_Alpha 

LOW COMPLEXITY 11.37 % 



3' -end processing and mrna degradation) [S. 



(S. cerevisiae, YLRl07w) 6e-10 



SEQ MGKAKVPASKRAPSSPVAKPGPVKTLTRKKNKKKKRFWKSKAREVSKKPASGPGAVVRPP 

Seg xxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ KAPEDFSQNWKALQEWLLKQKSQAPEKPLVISQMGSKKKPKIIQQNKKETSPQVKGEEMP 

S £Q xxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeecccccccccccccee 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



AGKDQEASRGSVPSGSKMDRRAPVPRTKASGTEHNKKGTKERTNGDIVPERGDIEHKKRK 

xxxxxx 

ecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

AKEAAPAPPTEEDIWFDDVDPADIEAAIGPEAAKIARKQLGQSEGSVSLSLVKEQAFGGL 

XXXXXXXXXXXX ,,....••••••••-•••••••*•**•••"**"*•*****"**■"**" 

hhhhcccccccceeeecccccchhhhhhccchhhhhhhhhhcccccchhhhhhhhhhhhh 
TRALALDCEMVGVGPKGEESMAARVSIVNQYGKCVYDKYVKPTEPVTDYRTAVSGIRPEN 
hhhcccccccccccccchhhhhhhhhccccccceeeeeeecccccccccccccccccccc 
LKQGEELEVVQKEVAEMLKGRILVGHALHNDLKVLFLDHPKKKIRDTQKYKPFKSQVKSG 
ccccchhhhhhhhhhhhhhcceeeeccchhhhhhhhhcccccccccceeecccccccccc 
RPSLRLLSEKILGLQVQQAEHCSIQDAQAAMRLYVMVKKEWESMARDRRPLLTAPDHCSD 
chhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 
DA 



Prosite for DKFZphtes3_7plO . 1 



PS00002 
PSQ0004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 



51->55 
107->111 
156->160 
9->12 
27->30 
46->49 
96->99 
347->350 
359->362 
363->366 
368->371 
136->140 

150- >154 
163->167 
190->194 
383->387 
413->417 
343->351 
342->351 
130->136 

151- >157 
221->227 
239->245 
171->174 



GLYCOSAMINOGLYCAN 

CAMP PHOSPHO SITE 

CAMP~PHOSPHO~SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 



PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 



(No Pfam data available for DKFZphtes3_7plO. 1) 
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DKFZphtes3_7p9 



group: nucleic acid management 

DKFZphtes3_7p9 encodes a novel 691 amino acid protein with similarity to human nuclear domain 
10 protein NDP52 . 

The nuclear domain (ND)10 also described as POD or Kr bodies is involved in the development of 
acute promyelocytic leukemia and virus-host interactions. The NDP52 protein is part of this 
complex structure. In vivo, NDP52 is transcribed in all human tissues, but is redistributed 
upon viral infection and interferon treatment. ND10 plays an important role in the viral life 
cycle. 

The novel protein is similar to NDP52 . It contains three leucine zippers and a RGD cell 
attachment site. This protein seems to be a novel part of the ND819) complex. 

The new protein can find application in modulation of viral infections and tumour events. 



similarity to nuclear domain 10 protein NDP52 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="329.1 cR from top of Chrl2 linkage group" 
Insert length: 3003 bp 

Poly A stretch at pos. 2957, no polyadenylation signal found 



1 AAGGTGAGGG GAACAGCTGA TCCGTCTGTT GGGAGGACAG ATATCTCAAG 
51 GCCAGGATGG AAGAATCACC ACTAAGCCGG GCACCATCCC GTGGTGGAGT 
101 CAACTTTCTC AATGTAGCCC GGACCTACAT CCCCAACACC AAGGTGGAAT 
151 GTCACTACAC CCTTCCCCCA GGCACCATGC CCAGTGCCAG TGACTGGATT 
201 GGCATCTTCA AGGTGGAGGC TGCCTGTGTT CGGGATTACC ACACATTTGT 
251 GTGGTCTTCC GTGCCTGAAA GTACAACTGA TGGTTCCCCC ATTCACACCA 
301 GTGTCCAGTT CCAAGCCAGC TACCTGCCCA AACCAGGAGC TCAGCTCTAC 
351 CAGTTCCGAT ATGTGAACCG CCAGGGCCAG GTGTGTGGGC AGAGCCCCCC 
4 01 TTTCCAGTTC CGAGAGCCAA GGCCCATGGA TGAACTGGTG ACCCTGGAGG 
451 AGGCTGATGG GGGCTCTGAC ATCCTGCTGG TTGTCCCCAA GGCAACTGTG 
501 TTACAGAACC AGCTCGATGA GAGCCAGCAA GAACGGAATG ACCTGATGCA 
551 GCTGAAGCTA CAGCTGGAGG GACAGGTGAC AGAGCTGAGG AGCCGAGTGC 
601 AGGAGCTCGA GAGGGCTCTG GCAACTGCCA GGCAGGAGCA CACGGAGCTG 
651 ATGGAACAGT ACAAGGGGAT TTCCCGGTCC CATGGGGAGA TCACAGAAGA 
701 GAGGGACATC CTGAGCCGGC AACAGGGAGA CCATGTGGCA CGCATCCTGG 
751 AGCTAGAGGA TGACATCCAG ACCATCAGTG AGAAAGTGCT GACGAAGGAA 
801 GTGGAGCTGG ACAGGCTTAG AGACACAGTG AAGGCCCTGA CTCGGGAACA 
851 AGAGAAGCTC CTTGGGCAAC TGAAAGAAGT ACAAGCAGAC AAGGAGCAAA 
901 GTGAGGCTGA GCTCCAAGTG GCACAACAGG AGAACCATCA CTTAAATTTG 
951 GACCTGAAGG AGGCGAAGAG CTGGCAAGAG GAGCAGAGTG CTCAGGCTCA 
1001 GCGACTGAAA GACAAGGTGG CCCAGATGAA GGACACCCTA GGCCAGGCCC 
1051 AGCAGCGGGT GGCCGAGCTG GAGCCCTTGA AGGAGCAGCT TCGAGGGGCC 
1101 CAGGAGCTTG CAGCCTCAAG CCAGCAGAAA GCCACCCTTC TTGGGGAGGA 
1151 GTTGGCCAGC GCAGCAGCAG CCAGGGACCG CACCATAGCC GAACTACACC 
1201 GCAGCCGCCT GGAAGTGGCT GAAGTTAACG GCAGGCTGGC TGAGCTCGGT 
1251 TTGCACTTGA AGGAAGAAAA ATGCCAATGG AGCAAGGAGC GGGCAGGGCT 
1301 GCTGCAGAGT GTGGAGGCAG AGAAGGACAA GATCCTGAAG CTGAGTGCAG 
1351 AGATACTTCG ATTGGAGAAG GCAGTTCAGG AGGAGAGGAC CCAAAACCAA 
1401 GTGTTCAAGA CTGAGCTGGC CCGGGAGAAG GATTCTAGCC TGGTACAGTT 
1451 GTCAGAAAGT AAGCGGGAGC TGACAGAGCT GCGGTCAGCC CTGCGTGTGC 
1501 TCCAGAAGGA AAAGGAGCAG TTACAGGAGG AGAAACAGGA ATTGCTAGAG 
1551 TACATGAGAA AGCTAGAGGC CCGCCTGGAG AAGGTGGCAG ATGAGAAGTG 
1601 GAATGAGGAT GCCACCACAG AGGATGAGGA GGCCGCTGTG GGGCTGAGCT 
1651 GCCCGGCAGC TCTGACAGAC TCAGAGGACG AGTCCCCAGA AGACATGAGG 
1701 CTCCCACCCT ATGGCCTTTG TGAGCGTGGA GACCCAGGCT CCTCTCCTGC 
1751 TGGGCCTCGA GAGGCTTCTC CCCTTGTTGT CATCAGCCAG CCGGCTCCCA 
1801 TTTCTCCTCA CCTCTCTGGG CCAGCTGAGG ACAGTAGCTC TGACTCGGAG 
1851 GCTGAAGATG AGAAGTCAGT CCTGATGGCA GCTGTGCAGA GTGGGGGTGA 
1901 GGAGGCCAAC TTACTGCTTC CTGAACTGGG CAGTGCCTTC TATGACATGG 
1951 CCAGTGGCTT TACAGTGGGT ACCCTGTCAG AAACCAGCAC TGGGGGCCCT 
2001 GCCACCCCCA CATGGAAGGA GTGTCCTATC TGTAAGGAGC GCTTTCCTGC 
2051 TGAGAGTGAC AAGGATGCCC TGGAGGACCA CATGGATGGA CACTTCTTTT 
2101 TCAGCACCCA GGACCCCTTC ACCTTTGAGT GATCTTACTC CCTCGTACAT 
2151 GCACAAATAC ACACTCATGC ACACACACAC TCACACACAT GCATACACTT 
2201 AGGTTTCATG CCCATTTTCT ATCACACTGG GCTCCATGAT ATTCTGTTCC 
2251 CTAAGAACTG CTTCTGTGTG CCCTGTTTTC ATCCCAAGAT TTCTCACTTC 
2301 ATCCTCTCCT ACCTGGCTCT TTTGTCCCAG GGAGGGGTCC TGTTCGGAAG 
2351 CAGTGGCTGA ATTTATCCCC TGAAAGTGGT TTTGGAGGAA CCGGGATGGA 
2401 GGAGGCCTTC CCCTGTGGGA ATAGAATCGT CCACTCCTAG CCCTGGTTGC 
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24 51 TTCTGATACA CAGCCACTGC ACACACACAC TCACACTCAC ACTCCCTTGT 
2501 CTGATGCCCC AAAGCCAATT CCTGGGGCAC CCTACCCTCT CTTATTTGGA 
2551 GTTTCCGTTG GTTTACCTGA GTTTTCTCTG GGGTCTGCAC AGAGGCAGCA 
2601 GCATGGACAT CATGGCCTCT CAGGTCCCTT TTGGTTCTCA GTTTCATTGG 
2651 TTCCTCTTTC TGTTCCCCCA TTGACTTCTG TGCCCCACCC TAGCCTTTTC 
2701 CATAACCTTA GGTATTCAGT TTGGAGGGGT TTTTTGTATT TTTGAGGATT 
27 51 CCTGTATTCT GTATCCTCTC CTCGCATCTC CTCACATGGA AAGAAATAAT 
2801 GTATTTGTGC CTTCTGTGAG GAATGGGGGG AACAAGTGGT CCCAGGTATC 
2851 CCCATTTCCA AGGCCCCCCT CCCTCTCCAG GTCCCCCCAC AGCAATAAAA 
2901 GCTTCCCCCT GATATCCATC CCTTTGTAGT TTGAACAAAT ATATTTATAT 
2951 GATATGTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3001 AAA 



BLAST Results 



Entry HS189353 from database EMBL: 
human STS WI-11261. 
Score - 2191, P - 1.4e-92, identities - 463/485 



Medline entries 



95310349: 

Molecular characterization of NDP52, a novel protein of the 
nuclear domain 10, which is redistributed upon virus 
infection and interferon treatment. 

97375672: 

Cellular localization, expression, and structure of the nuclear 
dot protein 52. 



Peptide information for frame 3 



ORF from 57 bp to 2129 bp; peptide length: 691 
Category: similarity to known protein 
Prosite motifs: RGD (557-560) 
LEUCINE_ZIPPER (163-185) 
L£UCINE_ZIPPER (475-497) 
LEUCINE_ZIPPER (482-504) 



1 MEESPLSRAP SRGGVNFLNV ARTYIPNTKV ECHYTLPPGT MPSASDWIGI 
51 FKVEAACVRD YHTFVWSSVP ESTTDGSPIH TSVQFQASYL PKPGAQLYQF 
101 RYVNRQGQVC GQSPPFQFRE PRPMDELVTL EEADGGSDIL LVVPKATVLQ 
151 NQLDESQQER NDLMQLKLQL EGQVTELRSR VQELERALAT ARQEHTELME 
201 QYKGISRSHG EITEERDILS RQQGDHVARI LELEDDIQTI SEKVLTKEVE 
251 LDRLRDTVKA LTREQEKLLG QLKEVQADKE QSEAELQVAQ QENHHLNLDL 
301 KEAKSWQEEQ SAQAQRLKDK VAQMKDTLGQ AQQRVAELEP LKEQLRGAQE 
351 LAASSQQKAT LLGEELASAA AARDRTIAEL KRSRLEVAEV NGRLAELGLH 
401 LKEEKCQWSK ERAGLLQSVE AEKDKILKLS AEILRLEKAV QEERTQNQVF 
451 KTELAREKDS SLVQLSESKR ELTELRSALR VLQKEKEQLQ EEKQELLEYM 
501 RKLEARLEKV ADEKWNEDAT TEDEEAAVGL SCPAALTDSE DESPEDMRLP 
551 PYGLCERGDP GSSPAGPREA SPLVVISQPA PISPHLSGPA EDSSSDSEAE 
601 DEKSVLMAAV QSGGEEANLL LPELGSAFYD MASGFTVGTL SETSTGGPAT 
651 PTWKECPICK ERFPAESDKD ALEDHMDGHF FFSTQDPFTF E 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7p9, frame 3 

PIR:A56733 nuclear domain 10 protein NDP52 - human, N = 2, Score * 307, 
P - 7.7e-28 

TREMBL: AB008852 1 gene: "NOP"; product: "XDPS2" ; Bos taurus mRNA for 
NDP52 , complete~cds . , N - 2, Score - 302, P = 4e-27 

TREMBL:AC004549_1 gene: "W0GSC : H_RG4 59N13 . l w ; product: "TXBP151"; Homo 
sapiens 3AC clone RG459N13 from 7pl5, complete sequence., N - 2, Score 
= 275, P - 2.3e-25 
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PIR:G02043 TXBP151 - human, N » 2, Score = 270, P » 8.5e-25 

TREMBL: DM358 16_4 gene: "zip"; product: "nonmuscle myosin-II heavy 
chain"; Drosophila melanogaster nonmuscle myosin-II heavy chain (zip) 
gene, complete cds., N - 1, Score « 254, P - 1.4e-17 



>PIR:A56733 nuclear domain 10 protein NDP52 - human 
Length =44 6 

HSPs: 

Score - 307 (46.1 bits), Expect - 7.7e-28, Sum P(2) « 7.7e-28 
Identities = 104/323 (32%), Positives « 158/323 (48%) 

Query: 15 VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 74 

V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 
Sbjct: 23 VIFNSVEKFYIPGGDVTCHYTFTQHFIPRRKDWIGIFRVGWKTTREYYTFMWVTLPIDLN 82 

Query: 75 DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 134 

+ S VQF+A YLPK + YQF YV+ G V G S PFQFR D LV + 

Sbjct: 83 NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGVVRGASIPFQFRPENEEDILWTTQ-- 139 

Query: 135 GGSDILLWPKATVLQNQ-LDES QQERNDLMQLKLQLEGQVTE-LRSRVQELERALA 189 

C+ + K +NQ L +S Q++N MQ +LQ + + E L+S ++LE + 
Sbjct: 140 GEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVK 199 

Query: 190 TARQE-HTELMEQYKGISRSHGEITEERDI-LSRQQGDHVARILELEDDIQTISEKVLTK 247 

+ TEL+ Q K ++ E+ I + + Q + E+E +Q +K T+ 

Sbjct: 200 EQKDYWETELL-QLKEQNQKMSSENEKMGIRVDQLQAQLSTQEKEMEKLVQGDQDK — TE 256 

Query: 248 EVE-LDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSW 306 

++E L + D + EQ K +L++ +Q+E QQE N DL + S 

Sbjct: 257 QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQNETTAMKKQQELMDENFDLSKRLSE 316 

Query: 307 QEEQS AQAQRLK DK VAQMK DT LGQAQQRV 335 

E QR K+++ D L + R+ 

Sbjct: 317 NEI ICNALQRQKERLEGENDLLKRENSRL 345 

Score - 304 (45.6 bits), Expect = 2.1e-27, Sum P(2) - 2.1e-27 
Identities = 98/337 (29%), Positives = 163/337 (48%) 

Query: 15 VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 74 

V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 
Sbjct: 23 V I FN S V EKF Y I PGG D VTC H YT FTQH F I P RRKDW I G I FRVGW KTT RE Y Y T FMWVTL P I D LN 82 

Query: 75 DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 134 

+ S VQF+A YLPK + YQF YV+ G V G S PFQFR P +E 
Sbjct: 83 NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGWRGASIPFQFR— PENE 130 

Query: 135 GGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQE 194 

DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE 

Sbjct: 131 — EDILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQE 182 

Query: 195 HTELMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDR 253 

E ++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+ 

Sbjct: 183 ELETLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQ 232 

Query: 254 LRDTVKALTREQEKLL — GQLKEVQAD KEQSEAELQVAQQENHHLNLDLKEAKSWQE 308 

L+ + +E EKL+ Q K Q + KE L + +Q L+ + Q 

Sbjct: 233 LQAQLSTQEKEMEKLVQGDQDKTEQLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQN 292 

Query: 309 EQSA — QAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQEL 351 

E +A + Q L D+ + L + + L+ KE+L G +L 

Sbjct: 293 ETTAMKKQQELMDENFDLSKRLSENEIICNALQRQKERLEGENDL 337 

Score » 124 (18.6 bits), Expect - 2.3e-06, Sum P(2) « 2.3e-06 
Identities « 53/227 (23%), Positives - 113/227 (49%) 

Query: 138 DILLWPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQEHTE 197 

DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE E 

Sbjct: 132 DILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQEELE 185 

Query: 198 LMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDRLRD 256 

++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+L+ 

Sbjct: 186 TLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQLQA 235 

Query: 257 TVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSWQEEQSAQAQR 316 

+ +E EKL VQ D++++E +L+ ++EN HL L L E + Q++ ++ 

Sbjct: 236 QLSTQEKEMEKL VQGDQDKTE-QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQ 288 
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Query: 317 LK-DKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELA-ASSQQKATLLGE 364 

+K ++ MK + Q+ + E L ++L + + A +QK L GE 
Sbjct: 289 MKQNETTAMK KQQELMDENFDLSKRLSENEI ICNALQRQKERLEGE 334 

Score = 103 (15.5 bits), Expect = 4.4e-04, Sum P(2) ■» 4.4e-04 
Identities - 63/278 (22%), Positives = 123/278 (44%) 

Query: 299 DLKEAKSWQEEQSAQAQRLKDKVAQMK DTLGQAQQRVAELEPLKEQLRGAQELAAS 354 

+++E + +E + Q LKD ++ D + Q++ ELE L + + EL 
SbjCt; 141 EVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETL-QSINKKLELKVK 199 

Query: 355 SQQKATLLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAG 414 

Q+ EL + +E + + V ++ +L+ + E+ Q +++ 

Sbjct: 200 EQKD — YWETELLQLKEQNQKMSSENEKMGIRVDQLQAQLSTQEKEM-EKLVQGDQDKTE 256 

Query: 415 LLQSVEAEKDKI-LKLSAEIL RLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKR 470 

L+ ++ E D + L L+ + +LE+ V E+ QN+ T + ++ + + SKR 

Sbjct: 257 QLEQLKKENDHLFLSLTEQRKDQKKLEQTV-EQMKQNET — TAMKKQQELMDENFDLSKR 313 

Query: 471 ELTELRSALRVLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNE DATTEDEEAA 527 

L+E LQ++KE+L+ E +LL ++ +RL +N T DE A 

Sbjct: 314 - LS ENE I ICNALQRQKERLEGEN- DLL- - - KRENSRLLS YMGLDFNS L P YQVPTS DEGGA 368 

Query: 528 — VGLSCPAALTD-SEDESPEDMRLPPYGLCERGDPGSSPAGPREASPL 573 

GL+ + E SP + + +C+ 0 ++ PL 

Sbjct: 369 RQNPGLAYGNPYSGIQESSSPSPLSIKKCPICKADDICDHTLEQQQMQPL 418 

Score - 64 (9.6 bits). Expect = 7.7e-28, Sum P(2) - 7.7e-28 
Identities - 13/29 (44%), Positives - 17/29 (58%) 

Query: 651 PTWKECPICKERFPAESDKDALEDHMDGH 67 9 

P CPIC + FPA ++K EDH+ H 
Sbjct: 417 PLC FNC P ICDKI FPA-TEKQI FEDHVFCH 444 

Score » 64 (9.6 bits), Expect = 5.8e+00, Sum P(2) - 1.0e+00 
Identities * 26/90 (28%), Positives = 45/90 (50%) 

Query: 470 RELTELRSALRVLQKEKEQLQEE KQELLEYMRKLEARLE-KVADEK--W 515 

+E EL+ + LQK+ +Q E KQE LE ++ + +LE KV ++K W 
Sbjct: 154 KENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVKEQKDYWETELLQLK 213 

Query: 516 --NEDATTEDEEAAVGLS-CPAALTDSEDE 542 

N+ ++E+E+ + + A L+ EE 
Sbjct: 214 EQNQKMSSENEKMGIRVDQLQAQLSTQEKE 243 

Score = 47 (7.1 bits), Expect = 4.6e-26, Sum P(2) = 4.6e-26 
Identities = 11/30 (36%), Positives » 17/30 (56%) 

Query: 631 MASGFTVGTLSETSTGG PAT PTWKEC PICK 660 

♦AG + E+S+ P + K+CPICK 

Sbjct: 374 LAYGNPYSGIQESSSPSPLSI— KKCPICK 401 



Pedant information for DKFZphtes3_7p9, frame 3 



Report for DKFZphtes3_7p9.3 



(LENGTH) 
[MW] 
(PD 
t HOMOL ) 
[FUNCATJ 
(FUNCAT] 
[ FUNCAT] 
2e-ll 
(FUNCAT) 
(FUNCAT) 
[ FUNCAT ] 
(FUNCAT] 



691 

77336.52 
4.77 

PIR:A56733 nuclear domain 
09.10 nuclear biogenesis 



30.04 organization of cytoskeleton (S. 
08.07 vesicular transport (golgi network, 



10 protein NDP52 - human 2e-29 
(S. cerevisiae, YDR356w] 2e- 



11 



cerevisiae, YDR356w) 2e-ll 
etc.) [S. cerevisiae, YDL058w) 



03.22 cell cycle control and mitosis (S. cerevisiae, YDR356w} 2e-ll 

30.03 organization of cytoplasm (S. cerevisiae, YDL058w] 2e-ll 
99 unclassified proteins (S. cerevisiae, YLR309c] 2e-08 

03.04 budding, cell polarity and filament formation {S. cerevisiae, YHR023w 



MYOl - myosin-1 isoform] 3e-07 



( FUNCAT ; 
myosin- 1 
I FUNCAT] 
(FUNCAT] 
[ FUNCAT ] 
(FUNCAT] 



[S. 'cerevisiae, YHR023w MYOl - 



08.22 cytoskeleton-dependent transport 
isoform] 3e-07 

03.25 cytokinesis IS. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-07 

09.13 biogenesis of chromosome structure (S. cerevisiae, YJL074c] 4e-07 

30.10 nuclear organization (S. cerevisiae, YNL250w] 4e-06 
03.07 pheromone response, mating-type determination, sex-specific proteins 
(S. cerevisiae, YBR289w) 4e-06 
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[FUNCAT) 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YBR289w] 

4e-06 

[FUNCAT] 04.05.01.04 transcriptional control (S. cerevisiae, YBR289w) 4e-06 

( FUNCAT 1 03.19 recombination and dna repair [S. cerevisiae, YNL250w) 4e-06 

(FUNCAT) 03.13 meiosis [S. cerevisiae, YNL250w] 4e-06 

(FUNCAT) 1 genome replication, transcription, recombination and repair (M. 
jannaschii, MJ1643) le-05 

(FUNCAT] 98 classification not yet clear-cut (S. cerevisiae, YJR134c] 4e-05 

( FUNCAT ] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] 4e-05 

(FUNCAT) 08.19 cellular import (S. cerevisiae, YNL243w] 7e-05 

( FUNCAT ] 01.03.16 polynucleotide degradation [S, cerevisiae, YNL243w] 7e-05 

(FUNCAT) 06.10 assembly of protein complexes (S. cerevisiae, YNL243w] 7e-05 

(FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 

2e-04 

[ FUNCAT ] 03.01 cell growth (S. cerevisiae, YNL079c) 2e-04 

[BLOCKS] BL00682B ZP domain proteins 

[EC) 3.6.1.32 Myosin ATPase le-13 

[PIRKW] nucleus 6e-10 

(PIRKW] phosphotransferase 2e-07 

[PIRKW] duplication 9e-07 

(PIRKW) citrulline le-09 

[PIRKW) tandem repeat le-13 

[PIRKW] heart 5e-ll 

[PIRKW] endocytosis 5e-09 

( PIRKW) polymorphism 3e-06 

(PIRKW) cornified cell envelope le-06 

(PIRKW) transmembrane protein 6e-12 

[PIRKW] serine/threonine-specific protein kinase 2e-Q7 

[PIRKW] cell wall le-06 

[PIRKW] zinc finger 5e-09 

[ PIRKW] metal binding 5e-09 

(PIRKW) DNA binding 8e-08 

[PIRKW] muscle contraction le-11 

(PIRKW) IgG constant region-binding le-06 

[PIRKW] acetylated amino end 4e-09 

( PIRKW) actin binding le-13 

[PIRKW) mitosis 9e-09 

(PIRKW) microtubule binding 9e-09 

[PIRKW] ATP le-13 

( PIRKW] thick filament le-10 

[PIRKW] phosphoprotein le-13 

[PIRKW] epidermis le-06 

[PIRKW] leucine zipper le-07 

[PIRKW] glycoprotein 4e-07 

( PIRKW] skeletal muscle 4e-10 

[PIRKW] disulfide bond le-07 

(PIRKW) calcium binding le-09 

[PIRKW] alternative splicing le-10 

[PIRKW] coiled coil le-13 

[PIRKW] P-loop le-13 

( PIRKW] heptad repeat 6e-10 

(PIRKW] methylated amino acid le-13 

(PIRKW) basement membrane 3e-06 

[ PIRKW) immunoglobulin receptor 2e-07 

[ PIRKW) peripheral membrane protein 5e-09 

(PIRKW) dimer le-07 

(PIRKW) cardiac muscle le-10 

(PIRKW) extracellular matrix 3e-06 

[PIRKW] hydrolase le-13 

[PIRKW] microtubule 6e-10 

[PIRKW] muscle 2e-09 

(PIRKW) membrane protein 3e-06 

[PIRKW] EF hand le-09 

( PIRKW] cytoskeleton 6e-12 

[PIRKW] hair le-09 

[PIRKW] calmodulin binding 5e-09 

(PIRKW) Golgi apparatus 3e-08 

[SUPFAM] myosin heavy chain le-13 

[SUPFAMJ conserved hypothetical P115 protein le-08 

(SUPFAM) hypothetical protein YJL074C 5e-07 

[SUPFAMJ centromere protein E 9e-09 

(SUPFAM) unassigned Ser/Thr or Tyr-specific protein kinases 2e-07 

[SUPFAM] calmodulin repeat homology le-09 

(SUPFAM) myosin motor domain homology le-13 

(SUPFAM) alpha-actinin actin-binding domain homology 3e-13 

[SUPFAM) tropomyosin 3e-07 

(SUPFAM) plectin 3e-13 

(SUPFAM) trichohyalin le-09 

[SUPFAM) pleckstrin repeat homology 4e-06 

[SUPFAM] ribosomal protein S10 homology 3e-13 
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[SUPFAM] 


giantin 3e-08 


[SUPFAM) 


protein kinase homology 2e-07 


[SUPFAM] 


protein kinase C zinc-binding repeat homology 4e-06 


(SUPFAM] 


involucrin le-06 


{SUPFAM] 


kinesin motor domain -homology 9e-09 


[SUPFAM] 


human early endosome antigen 1 5e-Q9 


[SUPFAM] 


unassigned kinesin-related proteins 8e-08 




no protein jc uo 


[SUPFAM] 


cytoskeletal keratin 3e-08 


(PROSITE] 


LEUCINE ZIPPER 3 


[PROSITE] 


RGO 1 


(PROSITE1 


MYRISTYL 6 


[PROSITE] 


CK2 PHOSPHO SITE 25 


[PROSITE] 


PKC PHOSPHORS ITE 6 


[KW] 


All Alpha 


[KW] 


LOW~COMPLEXITY 9 . 12 % 


[KW] 


COILED COIL 39.36 % 



SEQ MEESPLSRAPSRGGVNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRD 

SEG 

PRD cccccccccccccceeeecceeeeeccccceeeeeccccccccccceeeeeeeeeecccc 

COILS 



SEQ YHTFVWSSVPESTTDGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFRE 

SEG 

PRD eeeeeeeecccccccccchhhhhhhhhhhhccccccceeeeecccccccccccccccccc 

COILS 

SEQ PRPMDELVTLEEADGGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSR 

SEG 

PRD cccccceeehhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ VQELERALATARQEHTELMEQYKGISRSHGEITEERDILSRQQGDHVARI LELEDDIQTI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ SEKVLTKEVELDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KEAKSWQEEQSAQAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELAASSQQKAT 

SEG xx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCC . . CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC . CCCCCCCCCCCCCCCCCCCC 

SEQ LLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAGLLQSVE 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCC CCCCCCCCCCC 

SEQ AEKDKILKLSAEILRLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKRELTELRSALR 

SEG • 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCC 

SEQ VLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNEDATTEDEEAAVGLSCPAALTDSE 

SEG . xxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ DESPEDMRLPPYGLCERGDPGSSPAGPREASPLVVISQPAPISPHLSGPAEDSSSDSEAE 

SEG xxxxxxxxxxx 

• PRD hhhhccccccccccccccccccccccccccceeeeeeccccccccccccccccccccchh 

COILS 

SEQ DEKSVLMAAVQSGGEEANLLLPELGSAFYDMASGFTVGTLSETSTGGPATPTWKECPICK 

SEG xx 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ ERFPAESDKDALEDHMDGHFFFSTQDPFTFE 

SEG 

PRD cccccccchhhhhhhccccceeecccccccc 



COILS 
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Prosite for DKFZphtes3_7p9 . 3 



PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 
PS00029 
PS0O029 
PS00029 



190->193 
241->244 
257->260 
468->471 
652->655 
667->670 
28->32 
43->47 
68->72 
72->76 
129->133 
156->160 
208->212 
239->243 
282->286 
305->309 
376->380 
383->387 
468->472 
520->524 
537->541 
539->543 
543->547 
593->597 
S95->599 
597->601 

612- >616 
639->643 
652->656 
667->671 
683->687 

39->45 
107->U3 
204->210 
414->420 
561->567 

613- >619 
557->560 
163->185 
475->497 
482->504 



PKC_PHOSPHO SITE 

PKC_PHOSPHcTsiTE 

PKC_PHOSPHO SITE 

PKC^PHOSPHCTSITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHCTsiTE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2 PH0SPHCTS ITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO SITE 

CK2~PHOSPHCTsiTE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPHCTSITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHcTsiTE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 

LEUCINE_ZIPPER 
LEUCINE_ZIPPER 
LEUCINE ZIPPER 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
POOC00006 
PDOC0000.6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC0OOO8 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 
PDOC00029 
PDOC00029 
PDOC00029 



(No Pfam data available for DKFZphtes3_7p9 . 3) 
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DKFZphtes3_8e24 



group: signal transduction 

DKFZphtes3_8e24.3 encodes a novel 658 amino acid putative GTP-binding protein, related to 
yeast YGL099w and mouse MMR1 putative GTP-binding proteins. 

GTP-binding proteins are involved in various signal transduction pathways, transferring the 
signal of a cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



strong similarity to guanine nucleotide binding proteins 

complete cDNA, complete cds, potential start at Bp 31, EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 3290 bp 

Poly A stretch at pos . 3269, polyadenylation signal at pos . 3251 



1 CGTCCAGCGG TCGTGTTGCC ATGGGCCGGA GGAGAGCCCC GGCCGGTGGG 
51 TCGCTGGGAC GGGCCCTTAT GCGCCATCAG ACTCAGCGGA GCCGAAGCCA 
101 TCGTCACACT GACTCCTGGT TGCACACAAG TGAACTCAAT GATGGCTATG 
151 ATTGGGGTCG TCTTAATCTT CAGTCAGTGA CTGAACAGAG CTCCCTTGAT 
201 GACTTCCTTG CTACTGCAGA ACTTGCAGGA ACAGAGTTTG TAGCTGAAAA 
251 ACTTAATATT AAGTTTGTGC CTGCTGAGGC TAGAACTGGA CTACTGTCTT 
301 TCGAGGAGAG CCAGAGAATT AAGAAGCTCC ATGAAGAAAA CAAACAGTTC 
351 TTGTGTATAC CGAGGAGACC AAACTGGAAC CAAAATACTA CCCCAGAAGA 
401 ACTCAAACAA GCAGAGAAAG ATAACTTTCT AGAATGGAGA CGTCAGCTTG 
4 51 TCCGGCTAGA AGAGGAACAG AAGCTGATAT TGACTCCATT TGAACGAAAT 
501 TTGGACTTTT GGCGCCAGCT CTGGAGAGTC ATTGAGAGAA GTGATATTGT 
551 GGTCCAGATA GTAGATGCTC GAAACCCACT CCTGTTTAGA TGTGAGGATT 
601 TGGAATGTTA TGTGAAAGAA ATGGATGCCA ATAAGGAGAA CGTCATTCTG 
651 ATCAACAAGG CAGACTTGCT GACTGCTGAG CAGCGGAGTG CCTGGGCCAT 
701 GTACTTCGAA AAAGAAGATG TGAAGGTTAT TTTCTGGTCA GCTTTGGCCG 
751 GAGCCATTCC CCTGAATGGT GACTCTGAGG AAGAGGCAAA CAGAGATGAT 
801 AGACAAAGCA ACACAACTGA GTTTGGACAT TCCAGTTTCG ACCAGGCTGA 
851 AATTTCCCAC AGTGAATCCG AACATCTCCC AGCTAGGGAT TCTCCTTCAC 
901 TTAGTGAAAA TCCCACAACG GATGAAGATG ACAGTGAGTA TGAGGACTGT 
951 CCAGAGGAGG AGGAAGACGA CTGGCAGACG TGCTCAGAAG AAGACGGTCC 
1001 CAAGGAAGAG GACTGCAGCC AGGACTGGAA GGAAAGCTCT ACTGCAGATT 
1051 CTGAGGCTCG GAGCAGGAAA ACCCCACAGA AGAGGCAGAT ACACAATTTT 
1101 AGCCATCTGG TATCCAAGCA GGAGTTACTG GAGCTCTTTA AGGAGCTACA 
1151 CACTGGGAGA AAGGTGAAAG ATGGGCAACT TACGGTCGGA CTGGTGGGCT 
1201 ACCCTAATGT TGGTAAGAGT TCAACAATCA ACACCATCAT GGGCAACAAG 
1251 AAAGTATCTG TGTCTGCCAC ACCTGGTCAC ACAAAGCACT TTCAGACTCT 
1301 CTATGTGGAG CCTGGCCTCT GCCTGTGTGA CTGTCCTGGC TTGGTGATGC 
1351 CATCTTTTGT GTCTACCAAG GCAGAAATGA CTTGCAGCGG AATCCTCCCA 
1401 ATTGATCAGA TGAGAGATCA TGTTCCTCCT GTATCACTAG TTTGCCAGAA 
1451 TATTCCAAGA CATGTTTTAG AAGCTACCTA TGGCATTAAC ATCATAACGC 
1501 CTAGAGAGGA TGAAGATCCC CACCGACCTC CAACATCGGA AGAACTGTTG 
1551 ACAGCTTATG GATACATGCG AGGATTCATG ACAGCGCATG GACAGCCAGA 
1601 CCAGCCTCGA TCTGCGCGCT ACATCCTGAA GGACTATGTC AGTGGTAAGC 
1651 TGCTGTACTG CCATCCTCCT CCTGGAAGAG ATCCTGTAAC TTTTCAGCAT 
1701 CAACACCAGC GACTCCTAGA GAACAAAATG AACAGTGATG AAATAAAAAT 
1751 GCAGCTAGGC AGAAATAAAA AAGCAAAGCA GATTGAAAAT ATCGTTGACA 
1801 AAACTTTTTT CCATCAAGAG AATGTGAGGG CTTTGACCAA AGGAGTCCAG 
1851 GCTGTGATGG GTTACAAGCC CGGGAGTGGT GTAGTGACTG CATCCACTGC 
1901 GAGCTCTGAG AACGGGGCGG GGAAGCCCTG GAAAAAACAT GGCAACAGAA 
1951 ATAAAAAAGA AAAAAGTCGT AGACTCTACA AGCACCTGGA TATGTGAGGT 
2001 TGGGCTGCAA CAGAAATGTC ATCTGCATTG TGCAGATGGA AAAGAGCAGA 
2051 AGCTGCCTGT TGCCTGTGGA AGTGTCCCAA GACACTAGCA CTGTAGAACG 
2101 GGCCCTGCTC TTGCAGAGCA CGGCTGCACC CAACAGTCTC CATGTCAAGA 
2151 CCAAGGGCCT CCTGGAAACA CCAGCTCTGA CAAAAAGGAG TCATCTGGGA 
2201 GCCCGAGAAT CCTACTCCTG GCCGGGCACA GTGGCTCACG CACCAACATG 
2251 GAGAAACCCC GTCTCTACTA AAAATACAAA AAAATTAGCC AGGCGTGGTG 
2301 GCGCGCACCT GTAATCCCAG CTACTCGGGA GGCTGAGGCA GGAGAATCAC 
2351 TTGAACCAGG GAGGCAGAGT TTGCAGTGAA TGGAGATTGC GCCGCTGCAC 
2401 TCCAGCCTGG GCGACAGAGT GAGACTGCAT CACAAGAAAA AAAATTTGCA 
2451 AGGGATGGTT CACGAGACAC ATTTGGGACG AAGGTGAAAG AGAAATTCCC 
2501 CATTCTGAGT GTCCTAGTTG GGTTCCTCCG ACTCTAAACA AGGGACTTGG 
2551 GTTCAGTTAG TGTACAGCGG GGGCTCACGT CCACTAAGGA ACATGTAGAA 
2601 TGTAACCACC GGGTGACAGG GAAGCTGCGG TATTTACTAC CTAGCCCCCA 
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2651 TCTTCACTGG TTATTCCACT TATTTAAAAT GTCCAGAATA AGCAAATCTC 
2701 CATATAGAGG AAGTAGATTA GTGGTTGCTT CGGGATGGGA GGAATGGGAA 
2751 GATTGAGGTC TTTCTTTTGC AGTGATAAAA ATGTCCTAAA ATTGACTGTA 
2801 GCGATGGTCA CACAACTCTG AATATGCTTA AGACCATTGA ATTACACACT 
2851 TTACGTTGGT GAATTGTATG GTATGTAAAT TATAGTTCAA TAACATAGTT 
2901 ACAAAAGATA ATCAAAAGCA TGAAAGCACT ATTGATGTGG TTTGGATCTG 
2 951 TGTCCTCACC GAGTCTCATG TTGAAATGTA AGCCCCCTGG TGGGAGGCGA 
3001 TGGGATTATG GGGCAGAGTC CTCACAAACG GTTTAGCACC ACCCGCTCAG 
3051 TGCTGTTCTC CTGATATTGA GTCCTCATCA CATCTGGTTG CTTCAAAGTG 
3101 TGTGGTGCCT CCCCTCTGTC TCCCTCCTGC TCTGGCCATA TAAGATGTGC 
3151 CTGCTTCTCC TTCGCCTTCT AACATGATTG TAAGTTTCCT GAGGCCTCCC 
3201 TAGAAGCAAA AGCTGCTGTG CTTCCTGTAC CATCTACTGG ACCGTGAGCC 
3251 AATTAAACCT CTTTTCTTTA TAAAAAAAAA AAAAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 21 bp to 1994 bp; peptide length: 658 
Category: strong similarity to known protein 



1 MGRRRAPAGG SLGRALMRHQ TQRSRSHRHT DSWLHTSELN DGYDWGRLNL 
51 QSVTEQSSLD DFLATAELAG TEFVAEKLNI KFVPAEARTG LLSFEESQRI 
101 KKLHEENKQF LCIPRRPNWN QNTTPEELKQ AEKDNFLEWR RQLVRLEEEQ . 
151 KLILTPFERN LDFWRQLWRV IERSDIVVQI VDARNPLLFR CEDLECYVKE 
201 MDANKENVIL INKADLLTAE QRSAWAMYFE KEDVKVIFWS ALAGAIPLNG 
251 DSEEEANRDD RQSNTTEFGH SSFDQAEISH SESEHLPARD SPSLSENPTT 
301 DEDDSEYEDC PEEEEDDWQT CSEEDGPKEE DCSQOWKESS TADSEARSRK 
351 TPQKRQIHNF SHLVSKQELL ELFKELHTGR KVKDGQLTVG LVGYPNVGKS 
401 STINTIMGNK KVSVSATPGH TKHFQTLYVE PGLCLCDCPG LVMPSFVSTK 
4 51 AEMTCSGILP IDQMRDHVPP VSLVCQNIPR HVLEATYGIN IITPREDEDP 
501 HRPPTSEELL TAYGYMRGFM TAHGQPDQPR SARYILKDYV SGKLLYCHPP 
551 PGRDPVTFQH QHQRLLENKM NSDEIKMQLG RNKKAKQIEN IVDKTFFHQE 
601 NVRALTKGVQ AVMGYKPGSG VVTASTASSE NGAGKPWKKH GNRNKKEKSR 
651 RLYKHLDM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8e24, frame 3 

SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I., N « 3, Score = 560, P = 1.6e-lll 

PIR:S64106 hypothetical protein YGL099w - yeast (Saccharomyces 
cerevisiae), N - 2, Score « 544, P =» 2.6e-105 

TREMBL : CEAF3 1 4 3_1 gene: "CS3H9.2"; Caenorhabditis elegans cosmid 
C53H9., N = 1, Score - 551, P « 2.9e-53 

SWISSPROT:MMRl_MOUSE POSSIBLE GTP-BINDING PROTEIN MMR1 . , N - 2, Score = 
311, P - 7.5e-31 



>SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I. 

Length - 616 

HSPs: 

Score = 560 (84.0 bits), Expect * 1.6e-lll, Sum P(3) = 1.6e-lll 
Identities - 119/253 (47%), Positives - 163/253 (64%) 

Query: 12 LGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLDDFLATAELAGT 71 
LGRA+ T+ R+ + H + + R L+SVT ++ LD+FL TAEL 
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Sbjct: 12 LGRAI QS DFT KN RRN RK — GGLKH I VDS DPKAH - - RAALRS VTH ET DLD E FLNT AE LG E V 67 

Query: 72 EFVAEKLNIKFVP-AEARTGLLS FEESQRIKKLHEENKQFLCIPRRPNWNQNTTPEELKQ 130 

EF+AEK N+ + E LLS EE+ R K+ E+NK L IPRRP+W+Q TT EL + 

Sbjct: 68 EFIAEKQNVTVIQNPEQNPFLLSKEEAARSKQKQEKNKDRLTIPRRPHWDQTTTAVELDR 127 



Query: 191 CEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWSALAGAIPLNG 250. 

LE YVKE+ +K+N +L+NKAD+LT EQR+ W+ YF + ++ + F+SA • A N 
Sbjct: 188 SAHLEQYVKEVGPSKKNFLLVNKADMLTEEQRNYWSSYFNENNIPFLFFSARMAA-EANE 24 6 

Query: 251 DSEEEANRDDRQSN 264 

E+ + SN 
Sbjct: 247 RGEDLETYESTSSN 260 

Score - 532 (79.8 bits), Expect = 1.6e-lll, Sum P(3) = 1.6e-lll 
Identities = 131/323 (40%), Positives - 192/323 (59%) 

Query: 340 STADSEARSRKTPQKRQIHNFSHLVSKQELLELFKELHTGRKVKDGQ — LTVGLVGYPNV 397 

ST+ +E + +H+ S + + + L + F++ + + DG+ +T GLVGYPNV 

Sbjct: 256 STSSNEIPESLQADENDVHS-SRIATLKVLEGIFEKFAS — TLPDGKTKMTFGLVGYPNV 312 

Query: 398 GKSSTINTIMGNKKVSVSATPGHTKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSG 457 

GKSSTIN ++G+KKVSVS+TPG TKHFQT+ + + L DCPGLV PSF +T+A++ G 
Sbjct: 313 GKSSTINALVGSKKVSVSSTPGKTKHFQTINLSEKVSLLDCPGLVFPSFATTQADLVLDG 372 

Query: 458 ILPTDQMRDHVPPVSLVCQNIPRHVLEATYGINI-ITPREDEDPHRPPTSEELLTAYGYM 516 

+LPIDQ+R++ P +L+ + IP+ VLE Y I I I P E E P+++E+L + 

Sbjct: 373 VLPIDQLREYTGPSALMAERI PKEVLETLYTIRIRIKPIE-EGGTGVPSAQEVLFPFARS 431 

Query: 517 RGFMTAH-GQPDQPRSARYILKDYVSGKLLYCHPPPG — RDPVTFQHQHQRLLENKMNSD 573 

RGFM AH G PD R+AR +LKDYV+GKLLY HPPP F +H + + + SD 

Sbjct: 432 RGFMRAHHGTPDDSRAARILLKDYVNGKLLYVHPPPNYPNSGSEFNKEHHQKIVSA-TSD 490 

Query: 574 EIKMQLGR NKKAKQIEN-IVDKTFFHQEN-- VRALTKGVQAVM-G--YKPGSGWTA 624 

I +L R + E+ +VD +F QEN VR + KG M G YK + + 

Sbjct: 491 SITEKLQRTAISDNTLSAESQLVDDEYF-QENPHVRPMVKGTAVAMQGPVYKGRNTMQPF 549 

Query: 625 STASSENGAGK- PWKKHGNRNKKEKSRRL 652 

+++ + K P G + K+R+L 
Sbjct: 550 QRRLNDDASPKYPMNAQGKPLSRRKARQL 578 

Score = 47 (7.1 bits), Expect = 1.3e-60, Sum P(3) - 1.3e-60 
Identities - 21/84 (25%), Positives = 35/84 (41%) 

Query: 552 GROPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQENVRALTKGVQA 611 

G D T++ + + +DE + R K +E I +K F TK 

Sbjct: 248 GEDLETYESTSSNEIPESLQADENDVHSSRIATLKVLEGIFEK--FASTLPDGKTKMTFG 305 

Query: 612 VMGYKPGSGWTASTASSENGAGK 635 

++GY P G +ST ++ G+ K 
Sbjct: 306 LVGY-PNVG — KSSTINALVGSKK 326 

Score = 43 (6.5 bits), Expect - 1.6e-lll, Sum P{3) = 1.6e-lll 
Identities = 7/13 (53%), Positives * 9/13 (69%) 

Query: 638 KKHGNRNKKEKSR 650 

KKH +NK+ K R 
Sbjct: 596 KKHNKKNKRSKQR 608 



Query: 



Sbjct: 



131 AEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIVVQIVDARNPLLFR 190 

E+++FL WRR L +L++ + I+TPFERNL+ WRQLWRVIERSD+VVQIVDARNPL FR 
128 MERESFLNWRRNLAQLQDVEGFIVTPFERNLEIWRQLWRVIERSDVVVQIVDARNPLFFR 187 



Pedant information for DKFZphtes3_8e24 , frame 3 



Report for DKFZphtes3_8e24 . 3 



{HOMOLJ 
I. 5e-56 
[FUNCAT) 
[FUNCAT1 
[ FUNCAT ] 
(PIRKW] 
[PIRKW] 
[SUPFAMJ 



[LENGTH) 
[MW] 



[pi] 



658 

75226.58 
5.86 

SWISSPROT:YAWG SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN CHROMOSOME 



99 unclassified proteins (S. cerevisiae, YGL099w] 3e-55 

r general function prediction [M. jannaschii, MJ1464] le-16 

08.16 extracellular transport (S. cerevisiae, YER006wl 3e-09 

p-loop le-27 

GTP binding le-27 

conserved hypothetical protein MG442 7e-08 
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[PROSITE] ATP_GTP_A 1 

[PROSITEJ MYRISTYL 3 

( PROSITE] AMIDATION 2 

(PROSITE] CAMP_PHOSPHO SITE 1 

( PROSITE] CK2 PHOSPHO_SITE 19 

t PROSITE] TYR~PHOSPHO_SITE 2 

I PROSITE] PKC PHOSPHO SITE 10 

( PROSITE] ASN~GLYCOSYLATION 2 

IKW] Alphabet a 

(KW] LOW COMPLEXITY 4.56 % 



SEQ MGRRRAPAGGSLGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLD 

SEG xxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccccchhhhhhhhccccch 

SEQ DFLATAELAGTEFVAEKLNIKFVPAEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWN 

SEG 

PRD hhhhhhhhhhheeeecccceeeeeeccccccchhhhhhhhhhhhhhhhhhhccccccccc 

SEQ QNTTPEELKQAEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIVVQI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhcceeeee 

SEQ VDARNPLLFRCEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWS 

SEG 

PRD eccccccccchhhhhhhhhhhccccceeeeecccchhhhhhhhhhhhhhhhccceeeeec 

SEQ ALAGAIPLNGDSEEEANRDDRQSNTTEFGHSSFDQAEISHSESEHLPARDSPSLSENPTT 

SEG 

PRD cccccccccccchhhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

SEQ DEDDSEYEDCPEEEEDDWQTCSEEDGPKEEDCSQDWKESSTADSEARSRKTPQKRQIHNF 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccccccc 

SEQ SHLVSKQELLELFKELHTGRKVKDGQLTVGLVGYPNVGKSSTINTIMGNKKVSVSATPGH 

SEG 

PRD ccccchhhhhhhhhhhhhhhccccceeeeeecccccccccceeeeccccceeeeeccccc 

SEQ TKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSGILPIDQMRDHVPPVSLVCQNIPR 

SEG 

PRD cceeeeeeeccceeecccccccccccchhhhhhhhccccccccccccccceeeeecccch 

SEQ HVLEATYGINIITPREDEDPHRPPTSEELLTAYGYMRGFMTAHGQPDQPRSARYILKDYV 

SEG : 

PRD hhhhhhhhccccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhcc 

SEQ SGKLLYCHPPPGRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQE 

SEG 

PRD ccceeeeccccccccccchhhhhhhhhhcccchhhhhhhhcchhhhhhhhhhhhccccch 

SEQ NVRALTKGVQAVMGYKPGSGVVTASTASSENGAGKPWKKHGNRNKKEKSRRLYKHLDM 

SEG 

PRD hhhhhhhceeeeeecccccceeecccccccccccccccccccccchhhhhhhhhhccc 
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PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00Q05 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



264->268 
359->363 
410->414 
21->24 
26->29 
97->100 
348->351 
378->381 
448->451 
493->496 
531->534 
541->544 
649->652 
52->56 
57->61 
93->97 
123->127 
155->159 
252->256 
271->275 
279->283 



ASN_GLYCOSYLATION 

ASNJ3LYCOSYLATION 

CAMP_P HOS PHO_S I T E 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC PHOSPHO~SITE 

PKC~PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2 PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO~SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCO0006 
PDOC00006 
PDOCO0006 
PDOC00006 
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PS00006 


281- 


->285 


CK2 PHOSPHO_ 


SITE 


PDCK,UUUUb 


PS00006 


293->297 


CK2 PHOSPHO" 


SITE 


PDUCUUUU b 


PS00006 


299- 


->303 


CK2 PHOSPHO" 


SITE 


rDUCUUUU 0 


PS00006 


305->309 


CK2 PHOSPHO 


"SITE 


PDUCUUUU b 


PS00006 


320- 


->324 


CK2 PHOSPHO SITE 


rUUvUUUU © 


PS00006 


322->326 


CK2 PHOSPHO~SITE 


PUUCUUUUb 


PS00006 


340->344 


CK2~PHOSPHO~SITE 


o f^of* a ft A A c 
PuOCUUUUb 


PS00006 


365- 


->369 


CK2 PHOSPHO 


SITE 


o nor a a a a c 


PS00006 


449->453 


CK2 PHOSPHO* 


"site 


PDQCUUUUb 


PS00006 


493 


->497 


CK2 PHOSPHO 


"SITE 




PS00006 


505 


->509 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00007 


480 


->488 


TYR PHOSPHO_SITE 


PDOC00007 


PS00007 


190 


->198 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 




9->15 


MYRISTYL 


PDOC00008 


PS00008 


432 


->438 


MYRISTYL 




PDOC00008 


PS00008 


620 


->626 


MYRISTYL 




PDOC00008 


PS00009 




l->5 


AMI DAT I ON 




PDOC00009 


PS00009 


378 


->382 


AMIDATION 




PDOC00009 


PS00017 


393 


->401 


ATP GTP A 




PDOC00017 



(No Pfam data available for DKFZphtes3_8e24.3) 
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DKFZphtes3_8gll 



group: testes derived 

DKFZphtes3_8gll encodes a novel proline-rich 939 amino acid protein without similarity to 
known proteins. 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) . 

No informative BLAST results; No predictive prosite, pfam or -SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown, prolin ritch protein 
1 EST hit (from testis library) 
Sequenced by MediGenomix 
Locus : unknown 
Insert length: 3100 bp 

Poly A stretch at pos. 3056, polyadenylation signal at pos. 3041 

1 AGAGTCTTCC CTCAGCATAT TTTACGATAG AGAAGATCTT GTTCCAATGG 
51 AAGAAAGTGA GGACTCACAG AGTGATTCCC AGACAAGGAT TTCTGAGTCC 
101 CAACACTCCC TCAAGCCAAA TTATCTTTCC CAGGCCAAGA CTGACTTCTC 
151 AGAACAGTTC CAGTTGCTAG AAGATCTGCA GCTAAAAATA GCAGCAAAAC 
201 TCTTAAGGAG TCAAATACCC CCCGATGTGC CTCCACCTCT AGCTTCAGGT 
251 CTAGTCCTAA AATACCCTAT CTGCCTACAG TGTGGCCGAT GTTCAGGACT 
301 TAATTGCCAT CATAAATTAC AGACCACTTC GGGGCCTTAT CTTCTTATCT 
351 ATCCACAGCT CCACCTTGTA CGCACTCCTG AAGGCCATGG TGAGGTTCGG 
401 TTGCATCTTG GCTTTAGGCT GAGAATTGGG AAAAGATCCC AAATCTCAAA 
4 51 GTATCGTGAA AGAGATAGAC CCGTCATACG GAGAAGCCCT ATATCACCAT 
501 CACAAAGGAA AGCTAAAATC TATACTCAAG CTTCCAAGAG TCCTACTTCC 
551 ACAATAGATT TGCAGTCTGG GCCTTCCCAG TCCCCTGCTC CTGTACAAGT 
601 CTACATCAGG CGAGGACAAC GCAGCAGGCC TGACTTAGTA GAAAAGACAA 
651 AAACTAGAGC ACCTGGGCAC TATGAATTCA CTCAAGTTCA CAACCTACCA 
701 GAGAGTGACT CTGAAAGCAC TCAGAATGAA AAACGGGCTA AAGTGAGAAC 
751 CAAAAAGACC TCTGATTCAA AATATCCAAT GAAGAGAATC ACCAAGCGAC 
801 TTAGAAAACA CAGAAAGTTC TACACAAACA GTAGAACCAC AATAGAGAGT 
851 CCTTCTAGGG AATTAGCAGC CCATTTAAGA AGGAAGAGGA TTGGAGCAAC 
901 TCAGACAAGT ACTGCCTCTT TAAAAAGACA ACCTAAGAAA CCTTCCCAAC 
951 CCAAGTTCAT GCAACTGCTT TTTCAGAGCC TAAAGCGGGC ATTCCAAACA 
1001 GCACACAGAG TTATAGCTTC TGTTGGGCGG AAGCCTGTGG ACGGGACAAG 
1051 GCCAGACAAT TTGTGGGCAA GCAAAAACTA TTATCCAAAA CAAAATGCCA 
1101 GGGACTATTG CTTACCAAGC AGTATCAAAA GAGACAAGAG GTCAGCTGAC 
1151 AAGCTAACGC CAGCAGGCTC AACCATTAAG CAGGAGGACA TATTGTGGGG 
1201 AGGAACGGTC CAGTGCAGAT CAGCTCAACA GCCAAGAAGA GCTTACTCTT 
1251 TCCAACCCAG ACCTCTTCGA CTGCCCAAGC CCACAGATTC CCAAAGTGGT 
1301 ATTGCTTTCC AAACTGCCTC AGTGGGGCAG CCTCTGAGAA CTGTTCAAAA 
1351 GGACAGTAGT AGCAGATCAA AGAAAAACTT CTATAGAAAT GAAACCTCCA 
1401 GCCAGGAGTC TAAGAACTTG TCCACACCAG GAACCAGAGT TCAGGCCCGA 
14 51 GGAAGAATCC TACCTGGTTC CCCTGTGAAG AGAACCTGGC ACCGACATCT 
1501 TAAAGACAAA CTCACACACA AGGAGCATAA CCACCCCAGC TTCTATAGGG 
1551 AGAGAACCCC ACGCGGTCCT TCTGAGAGAA CCCGTCATAA CCCCTCTTGG 
1601 AGAAACCATC GCAGTCCCTC TGAGAGAAGC CAACGCAGTT CCTTGGAGAG 
1651 AAGACATCAC AGTCCCTCTC AGAGGAGCCA CTGCAGTCCC TCTAGGAAAA 
1701 ACCATTCCAG TCCTTCTGAG AGAAGCTGGC GCAGTCCGTC TCAGAGAAAT 
1751 CACTGCAGTC CCCCCGAGAG GAGCTGTCAC AGTCTCTCTG AAAGGGGCCT 
1801 TCACAGTCCC TCTCAGAGGA GCCATCGCGG TCCCTCTCAG AGAAGACATC 
1851 ACAGTCCCTC AGAGAGAAGC CATCGCAGTC CCTCAGAGAG AAGCCATCGC 
1901 AGTCCCTCTG AGAGAAGACA TCGCAGTCCC TCCCAGAGGA GCCATCGCGG 
1951 TCCCTCAGAG AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC 
2001 CCTCTCAGAG GAGCCATCGT GGTCCCTCTG AGAGAAGACA TCACAGTCCC 
2051 TCTAAGAGAA GCCATCGCAG TCCCGCTCGG AGGAGCCATC GCAGTCCCTC 
2101 AGAGAGAAGC CATCACAGTC CCTCTGAGAG AAGCCATCAC AGTCCCTCTG 
2151 AGAGAAGACA TCACAGTCCC TCTGAGAGAA GCCATTGCAG TCCCTCTGAG 
2201 AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC CCTCTGAGAG 
2251 AAGACATCAC AGTCCCTCAG AGAAAAGCCA TCACAGTCCC TCTGAGAGAA 
2301 GCCATCACAG TCCCTCTGAG AGAAGACGTC ACAGTCCCTT GGAGAGGAGC 
2351 CGTCACAGTC TCTTGGAGAG GAGCCATCGC AGTCCCTCTG AGAGGAGATC 
2401 TCACAGGTCC TTTGAGAGGA GCCATCGTAG GATTTCTGAG AGAAGTCACA 
24 51 GTCCCTCAGA GAAGAGCCAC CTCAGTCCCT TGGAAAGAAG CCGTT6CAGT 
2501 CCCTCTGAGA GGAGAGGACA CAGTTCCTCT GGGAAAACCT GTCACAGTCC 
2551 CTCTGAGAGA AGCCATCGCA GTCCCTCCGG GATGAGGCAA GGGAGGACCT 
2601 CTGAGAGGAG CCATCGCAGT TCCTGTGAGA GAACCCGTCA CAGTCCCTCT 
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2651 GAGATGAGGC CAGGGAGGCC CTCTGGGAGG AACCATTGCA GTCCCTCTGA 
2701 GAGGAGCCGA CGCAGTCCCC TTAAGGAGGG ACTCAAGTAC AGTTTCCCTG 
2751 GAGAGAGGCC CAGCCATAGT TTGTCTAGAG ATTTCAAGAA TCAAACAACT 
2801 CTCCTCGGGA CCACACATAA AAATCCCAAA GCAGGGCAAG TGTGGAGGCC 
2851 TGAAGCTACT CGATGAGGCG AGGTCCGCCC CTATTATTCA TTGTCCTAAG 
2901 TCTTCATCGT GCTGCCCTTT CCAGGCTTCT TTCCTGCTCA GCCACTGCCT 
2951 CCAATTCCTG CGCCCCCAGC GTGGAAAGGC TTCCATTTCT CTCTACCGGG 
3001 GGGGAGGCGG GTGAGAATGG GTCTGTAATT TCTCTAAGAT GAATAAAGGG 
3051 GCAGTTAATT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 2863 bp; peptide length: 939 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: AT P_GTP_A (824-832) 



1 MEESEDSQSD SQTRISESQH SLKPNYLSQA KTDFSEQFQL LEDLQLKIAA 
51 KLLRSQIPPD VPPPLASGLV LKYPICLQCG RCSGLNCHHK LQTTSGPYLL 
101 IYPQLHLVRT PEGHGEVRLH LGFRLRIGKR SQISKYRERD RPVIRRSPIS 
151 PSQRKAKIYT QASKSPTSTI DLQSGPSQSP APVQVYIRRG QRSRPDLVEK 
201 TKTRAPGHYE FTQVHNLPES DSESTQNEKR AKVRTKKTSD SKYPMKRITK 
251 RLRKHRKFYT NSRTTIESPS RELAAHLRRK RIGATQTSTA SLKRQPKKPS 
301 QPKFMQLLFQ SLKRAFQTAH RVIASVGRKP VDGTRPDNLW ASKNYYPKQN 
351 ARDYCLPSSI KRDKRSADKL TPAGSTIKQE DILWGGTVQC RSAQQPRRAY 
401 SFQPRPLRLP KPTDSQSGIA FQTASVGQPL RTVQKDSSSR SKKNFYRNET 
451 SSQESKNLST PGTRVQARGR ILPGSPVKRT WHRHLKDKLT HKEHNHPSFY 
501 RERTPRGPSE RTRHNPSWRN HRSPSERSQR SSLERRHHSP SQRSHCSPSR 
551 KNHSSPSERS WRSPSQRNHC SPPERSCHSL SERGLHSPSQ RSHRGPSQRR 
601 HHSPSERSHR SPSERSHRSP SERRHRSPSQ RSHRGPSERS HCSPSERRHR 
651 SPSQRSHRGP SERRHHSPSK RSHRSPARRS HRSPSERSHH SPSERSHHSP 
701 SERRHHSPSE RSHCSPSERS HCSPSERRHR SPSERRHHSP SEKSHHSPSE 
751 RSHHSPSERR RHSPLERSRH SLLERSHRSP SERRSHRSFE RSHRRISERS 
801 HSPSEKSHLS PLERSRCSPS ERRGHSSSGK TCHSPSERSH RSPSGMRQGR 
851 TSERSHRSSC ERTRHSPSEM RPGRPSGRNH CSPSERSRRS PLKEGLKYSF 
901 PGERPSHSLS RDFKNQTTLL GTTHKNPKAG QVWRPEATR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8gll, frame 2 

TREMBL: AF0611S5 1 gene: "car90"; product: "cyst germination specific 
acidic repeat protein precursor"; Phytophthora infestans cyst 
germination specific acidic repeat protein precursor (car90) gene, 
complete cds., N = 1, Score = 457, P = 2.3e-39 

TREMBL:AC004 561_38 gene: "F16P2. 41"; product: "putative proline-rich 
protein"; Arabidopsis thaliana chromosome II BAC F16P2 genomic 
sequence, complete sequence., N * 1, Score = 340, P - 4.2e-27 

TREMBL:AF062655_1 product: "plenty-of-prolines-101"; Mus musculus 
plenty-of-prolines-101 mRNA, complete cds., N = 1, Score = 313, P = 
3.6e-24 

PIR:PN0099 son3 protein - human (fragment), N *» 1, Score - 292, P = 
1.2e-22 



>TREMBL:AF0 61 185_1 gene: "car90"; product: "cyst germination specific acidic 
repeat protein precursor"; Phytophthora infestans cyst germination 
specific acidic repeat protein precursor (car90) gene, complete cds. 
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Length = 1,489 



HSPs: 

Score = 457 (68.6 bits), Expect - 2.3e-39, P = 2.3e-39 
Identities =» 91/444 (20%), Positives = 239/444 (53%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+p + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 584 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 642 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++ p + + + +p+ + p+E + + P + + +P E + ++ +E ++P++ + 
Sbjct: 643 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 702 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSP5ERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

p++ + p+E + +p+E + +P+E + P + + GP+E + +P+E +P+ 
Sbjct: 703 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 762 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
Sbjct: 763 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 822 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+p+E + p+E +P+E ++P+E++ ++P+E++ ++P+E ++P E + + 
Sbjct: 823 YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 882 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + +P++ ++ E + + E +++P+E++ +P E + P+E ++ + +T 
Sbjct: 883 EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 942 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E + +P+ +E + + E T + P+E P+ +P+E + +P+ 

Sbjct: 943 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 1002 

Query: 893 KEGLKYSFPGERPSHSLSRDFKNQTT 918 

+E Y+ P E +++ + + + T 
Sbjct: 1003 EE-TT YA- PTEETTYAPAEETPYEPT 1026 

Score - 445 (66.8 bits), Expect « 4.5e-38, P = 4.5e-38 
Identities » 83/394 (21%), Positives = 212/394 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + E ++P++ + +P+ + P+E + 

Sbjct: 763 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 822 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ p E + ++ +E ++P++ + P+++ ++P+E + +P+E + P+ 

Sbjct: 823 YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 882 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 631 

E +P++ + P+E + + +E +P++ + P+E + P++ + +P + 
Sbjct: 883 EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 942 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P+E ++P+E + P+E + +P+E +P+E ++P 
Sbjct: 943 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 1002 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E ++ ++P+E + ++P+E + P E + ++ E + +P+E ++ S E + + E + 
Sbjct: 1003 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETT 1062 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E++ P E + +P+E ++ + +T ++P+E + +P+ +E + 

Sbjct: 1063 YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 1122 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ +P E + P+E 

Sbjct: 1123 EETTYAPTEETTYAPTEETMYAPIEETTYGPTEE 1156 

Score = 439 (65.9 bits), Expect - 2.0e-37, P - 2.0e-37 
Identities - 86/421 (20%), Positives = 223/421 (52%) 



Query: 
Sbjct: 
Query: 
Sbjct: 



475 SPVKRTWHRHLKDKLTHKSHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

848 APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 906 

534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++p++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
907 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 966 
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Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P+ + + P+E + +P+E + + P+E +P + + P+E + +P+E P+ 
SbjCt: 967 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 1026 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 
SbjCt: 1027 EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1086 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
SbjCt: 1087 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 1146 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + + P+E ++ + +T 
SbjCt: 1147 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 1206 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E + +P+ +E + + E T + P+E P+ +P+E + +P 

SbjCt: 1207 YAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPT 1266 

Query: 893 KE 894 
+ E 

Sbjct: 1267 EE 1268 

Score « 439 (65.9 bits), Expect - 2.0e-37, p - 2.0e-37 
Identities - 91/434 (20%), Positives 232/434 (53%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

SbjCt: 4 40 APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 498 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 4 99 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 558 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

p++ + p+E + +P+E + +P+E +P + + P+E + +P+E P+ 
SbjCt: 559 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 618 

Query: 654 QRSHRGPSERRHHSPSKRSHRS PARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 
Sbjct: 619 EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 678 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
SbjCt: 679 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 738 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 
SbjCt: 739 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 798 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

+ + P+E ++P TE + +ET ++P+E P P+ +P+E + +P 

Sbjct: 799 YAPTEETTYAP TEETPYEPT-EETTYAPTEETPYEPTEETTYTPTEETTYAPT 850 

Query: 893 KEGLK YS FPGERPSHS 908 

+E Y+ P E+ +++ 
Sbjct: 851 EE-TTYA-PTEKTTYA 864 

Score = 437 (65.6 bits), Expect - 3.3e-37, P = 3.3e-37 
Identities = 85/417 (20%), Positives «* 223/417 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + E+ ++P++ + +P+ + P+E + 

Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+p ++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + +P+ 
Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E P++ + P+E ++P++ + +P + 

Sbjct: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+ P+E + ++P+E + + P+E ++P+E + +P+E + + +E +P+E ++P+ 
Sbjct: 599 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRS FERS-HRRISERS 800 

E++ + p+E + ++P+E ++P E + ++ E + +P+E ++ E + + E + 
Sbjct: 659 EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 718 
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Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E++ +P E + +P E + + +T ++P+E + +P+ +E + 

Sbjct: 719 YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 778 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTT 918 

T ++P+E P+ +P+E + +P +E Y P E +++ + + + T 

Sbjct: 779 GETTYAPTEETTYAPTEETTYAPTEETTYAPTEE-TPYE-PTEETTYAPTEETPYEPT 834 

Score = 428 (64.2 bits), Expect = 3.1e-36, P =» 3.1e-36 
Identities = 89/440 (20%), Positives - 228/440 (51%) 



Query : 


473 


PGSPVKRTWHRHLKDKLTHKEHNHPSFYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 


531 




P P + T + K+ T+ ++ E T P+E T + P+ P+E + + 




Sbjct: 


470 


PYEPTEETTYAPTKET-TYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYA 


528 


Query: 


532 


SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQR 


591 






E ++P++ + +P+ + +P+E + +P++ P E + ++ +E ++P++ 




Sbjct: 


529 


PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 


588 


Query: 


592 


SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 


651 






+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E + 




Sbjct: 


589 


TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 


643 


Query: 


652 


PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 


711 




p++ + p+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 




Sbjct: 


649 


PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 


708 


Query: 


712 


SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 


771 




+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 




Sbjct : 


709 


TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 


768 


Query: 


772 


LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 


830 




E + p+ ++ E + + E +++P+E++ +P E + P+E ++ + + 




Sbjct: 


769 


PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 


828 


Query: 


831 


TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 


890 




T + P+E + +P+ +E + + E+T ++P+E P+ P+E + + 




Sbjct: 


829 


TPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYA 


888 


Query: 


891 


PLKEGLKYSFPGERPSHSLSRD 912 






P KE Y+ P E +++ + + 




Sbjct: 


889 


PTKE-TTYA-PTEETTYASTEE 908 




Score 


- 427 


(64.1 bits), Expect = 4.0e-36, P = 4.0e-36 




Identities = 81/394 (20%), Positives =» 213/394 (54%) 




Query: 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 




E T GP+E T + P+ +P+E + + E + P+ + +P+ + +P+E + 




Sbjct: 


739 


EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 


798 


Query: 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 




+P++ +P E + + +E ++P++ + P++ ++P+E + +P+E + +P+ 




Sbjct: 


799 


YAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPT 


858 


Query: 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E+ +P++ + P+E + P+E +P++ + P+E ++ ++ + +P + 




Sbjct: 


. 859 


EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETT 


918 


Query: 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 




+P+E + + P+E + ++P+E ++P+E + +P+E + +P+E +P+E + P+ 




Sbjct: 


919 


YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 


978 


Query: 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


800 




E++ ++P+E + ++P+E ++P+E + ++ E + +P+E + E + + E + 




Sbjct: 


979 


EETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


1038 


Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 




++P+E++ + E + +P+E ++ + +T + P+E + +P+ +E + + 




Sbjct: 


1039 


YAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPT 


1098 


Query: 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 








E T ++P+E P+ P+E + +P +E 




Sbjct: 


1099 


EETT YAPTEETT YAPAEET PYEPTEETT YAPTEE 1132 




Score 


= 424 


(63.6 bits), Expect - 8.5e-36, P - 8.5e-36 





Identities = 81/394 (20%), Positives = 210/394 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E + P++ + +P+ + +P+E + 

Sbjct: 939 EETT YAPTEETT YAPTEETTYAPTEETTYAPAEETP YE PTEETTYAPTEETTYAPTEETM 998 
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Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+p + +p e + ++ +E + P++ + P++ ++P+E + + +E + + P+ 
Sbjct: 999 YAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPT 1058 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E + P++ + P+E + +P+E + P++ + P+E + + P++ + + PA + 
Sbjct: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

p+E + ++P+E + ++P+E ++P E + P+E + + P+E + P+E ++P+ 
Sbjct: 1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ + P+ + ++P+E ++P E + ++ E + +P+E + E + + E + 
Sbjct: 1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

+ p+E++ +P E + +P+E ++ + +T ++P + + P+ +E + + 

Sbjct: 1239 YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIOETTYGPTEETTYAPTEATTYAPT 1298 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+G +P+E + + P +E 

Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEE 1332 

Score = 422 (63.3 bits), Expect - 1.46-35, P - 1.4e-35 
Identities - 84/407 (20%), Positives » 216/407 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ P+E + + E + P++ + +P+ + +P+E + 

Sbjct: 795 EETTYAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETT 854 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+p+++ +p E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 
Sbjct: 855 YAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPT 914 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E +P++ + P+E ++P++ + +PA + 
Sbjct: 915 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 974 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P E + +P+E + +P+E P+E ++P+ 

Sbjct: 975 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPT 1034 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRS FERS-HRRISERS 800 

E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E + + E + 
Sbjct: 1035 EETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETT 1094 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E+ + +P E + +P+E + + +T ++P+E + +P+ E + 

Sbjct: 1095 YAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPT 1154 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

E T ++P+E P+ +P+E + P E Y+ P E +++ 

Sbjct: 1155 EETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 1200 

Score = 421 (63.2 bits), Expect = 1.8e-35, P = 1.8e-35 
Identities = 86/418 (20%), Positives = 219/418 (52%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



491 HKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSR 550 

H H E T P+E T + P+ +P+E + + E + P++ + +P+ 

376 HYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTPTE 435 

551 KNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHR 610 

+ +P+E + +P+++ +P E + ++ +E + P++ + P++ ++P+E + 
4 36 ETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTY 495 

611 SPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSK 670 

+ +E + +P+E +P++ + P+E + +P+E +P++ + P+E ++P++ 
496 ASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTE 555 

671 RSHRSPARRSHRSPSERSHKSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHR 730 

+ +PA + P+E + ++P+E + ++P+E ++P E + +P+E + +P+E 
556 ETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPY 615 

731 SPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFE 790 

P+E ++P+E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E 
616 EPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTE 675 

7 91 RS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQG 84 9 
+ + E +++P+E++ +P E + +P+E + + +T ++P+E + +P+ 
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Sbjct: 676 ETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPVEPTEETTYAPTEETTYAPTEETMY 735 

Query: 850 RTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

E + E T ++P+E P+ +P+E + P E Y+ P E +++ 

Sbjct: 736 APIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 792 

Score - 420 (63.0 bits), Expect - 2.3e-35, P » 2.3e-35 
Identities = 82/393 (20%), Positives - 206/393 (52%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + +E ++P++ + +P+ + P+E + 

Sbjct: 971 EETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETT 1030 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + ++ +E ++P++ + P++ + P+E + + P+E + + P+ 
Sbjct: 1031 YAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 1090 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRSH 681 

E +P++ + P+E + +P+E P++ + P+E ++P++ + +P + 

Sbjct: 1091 EETTYAPTEETTYAPTEETT YAPAEET PYEPTEETTYAPTEETTYAPTEETMYAPI EETT 1150 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P+E + P+ + +P+E +P+E ++P+ 
Sbjct: 1151 YGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPT 1210 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ ++P+E + + P+E ++P E + + E + +P+E ++ E + + E 
Sbjct: 1211 EETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETM 1270 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++ p +++ p E + +P+E ++ ♦ +T ++P+E + P+G +E + + 

Sbjct: 1271 YAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPT 1330 

Query: 861 ERTRHSPSEMRPGRP SGRNHCSPSE 885 

E T ++P E P P S C+ E 

Sbjct: 1331 EETTYAPMEETPYEPAEESTSTVSTEKPCNTEE 1363 

Score = 419 (62.9 bits), Expect = 3.0e-35, P - 3.0e-35 
Identities = 83/411 (20%), Positives = 215/411 (52%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + E ++P++ + +P+ + +P E + 

Sbjct: 947 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 1006 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+ p++ +P E + + +E ++P++ + P++ ++ +E + +P+E + +P+ 
Sbjct: 1007 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 1066 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRSH 681 

E P++ + P+E + +P+E +P++ + P+E ++P++ + P + 

Sbjct: 1067 EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 1126 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E ++P E + P+E + +P+E + +P+E +P+E + P+ 
Sbjct: 1127 YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1186 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

++ ++P+E + ++P+E ++P E + ++ E + P+E ++ E + + E + 
Sbjct: 1187 GETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETT 1246 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E++ +P E + +P+E ++ +T + P+E + +P+ +E + + 

Sbjct: 1247 YAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPT 1306 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRS PLKEGLKYSFPGERPSHSLSRD 912 

E T + P+ P+ +P+E + +P++E Y P E + ++S + 

Sbjct: 1307 EETTYEPTGETTYAPTEETTYAPTEETTYAPMEE-TPYE-PAEESTSTVSTE 1356 

Score = 415 (62.3 bits), Expect = 8.0e-35, P « 8.0e-35 
Identities « 84/423 (19%), Positives = 218/423 (51%) 

Query: 473 PGSPVKRTWHRHLKDKLTHKEHNHPSEYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 531 

P P + T + K+ T+ ++ E T P+E T + P+ P+E + + 

Sbjct: 873 PYEPTEETTYAPTKET-TYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYA 936 

Query: 532 SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLKSPSQR 591 

E ++P++ + +P+ + +P+E + +P++ P E + ++ +E ++P++ 

Sbjct: 937 PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 996 

Query: 592 SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 651 
+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E + 
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Sbjct: 997 TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 1056 

Query: 652 PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 711 

p ++ + p+E + P++ + +P + + P+E + ++P+E + ++P+E ++P+E 
Sbjct: 1057 PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 1116 

Query: 712 SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 771 

+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 
Sbjct: 1117 TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 1176 

Query: 772 LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 830 

E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 
Sbjct: 1177 PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 1236 

Query: 831 TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 890 

T + P+E + +P+ +E + + E T ++P + P+ +P+E + + 

Sbjct: 1237 TTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYA 1296 

Query: 891 PLKE 894 
P +E 

Sbjct: 1297 PTEE 1300 

Score * 403 (60.5 bits), Expect - 1.6e-33, P - 1.6e-33 
Identities - 84/394 (21%), Positives » 213/394 (54%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



501 RERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERS 560 

' RE T PSE T + P +P+E+ +E + + ++ +P++ ++P+ER 

319 REETTAAPSEDTTYAPREVTPYAPTEKPY-- DVEETTYVTEESTY-APTKSETNAPTERM 375 

561 WRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSP 620 

+ ++ C E + ++ +E ++P++ + P++ ++P+E + P+E + +P 
376 HYAHIEKP-CDT-EVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTP 433 

621 SERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRS 680 

+E + p ++ + P+E++ +P+E + P++ + P+E ++P+K + +P + 
434 TEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEET?YEPTEETTYAPTKETTYAPTEET 493 

681 HRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSP 740 

+ +E + ++p+E + ++P+E + P+E + +P+E + +P+E +P+E ++P 
494 TYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAP 553 

741 SEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISER 799 

+E++ ++P+E + + P+E ++P E+++E++PE++ E++ E 
554 TEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEET 613 

800 SHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSS 859 

+ P+E++ +P E + +P+E ++S+ +T ++P+E + +P+ +E + + 

614 PYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAP 673 

860 CERT RHS PSEMRPGRPSGRNHCS PSERS RRS PLKE 894 

E T ++P+E P+ +P+E + +P +E 

674 TEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 703 



Score - 398 (59.7 bits), Expect = 5.5e-33 f P = 5.5e-33 
Identities = 84/402 (20%), Positives =» 209/402 (51%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+ p + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

992 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 1050 

534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 
E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
1051 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 1110 

594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRKRSPS 653 
P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 
1111 YAPAEETPYE PTEETTYAPTEETTYAPTEETMYAPI EETT YGPTEETTYAPTEATTYAPT 1170 

654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 
+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
1171 EET PYAPTEETTYEPTGETTYAPT EETT YAPTEETTYAPT EETT YAPTEETPYEPTEETT 1230 

714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 
+P+E + P+E +P+E ++P+E++ ++P+E + ++P + + P E + ++ 
1231 YAPTEETTYEPTEETTYAPTSETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPT 1290 

774 ERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCH 833 
E + +P+E + E E ++ P+ ++ +P E + +P+E ++ +T + 

1291 EATTYAPTEETPYAPTE ETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPY 1343 



834 SPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 876 
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P+E S+S+ TE+ +ET PS + P + 

Sbjct: 1344 EPAEESTSTVSTEKPCNTEEFTDEPTDEPT-DEPSDEPTDEPT 1385 

Score = 368 (55.2 bits), Expect - 9.5e-30, P - 9.5e-30 
Identities = 79/386 (20%), Positives « 211/386 (54%) 

Query: 52 4 PSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSER 583 

PS+ ++ + E + P + + +PS +P E + + P+++ + E + + ++E 

Sbjct: 303 PSDETEAPT-EGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY-- DVEETTY-VTEE 358 

Query: 584 GLHSPSQRSHRGPSQRRHHSPSER SHRSPSERSHRSPSERRHRSPSQRSHRGPS 637 

++P++ P++R H++ E+ + +P+E + +P+E +P++ + P+ 

Sbjct: 359 STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 418 

Query: 638 ERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSH 697 

E + P+E + P+ + + P+E ++P++++ +P + +P+E + + P+E + 
Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 4 78 

Query: 698 HSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPS 757 

++ p ++ ++P+E + + +E + +P+E + P+E + P+E++ ++P+E + ++P+ 
Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 758 ERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSR 816 

E ++P E + ++ E + +P+E + E + + E +++P+E++ +P+E + 
Sbjct: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 817 CSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 87 6 

+P+E ++ + +T + P+E + +P+ +E + +S E T ++P+E P+ . 

Sbjct: 599 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 877 GRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

P+E + +P +E Y+ P E +++ 
Sbjct: 659 EETPYEPTEETTYAPTEE-TTYA-PTEETTYA 688 

• Score " 337 (50.6 bits), Expect « 2.1e-26, P - 2.1e-26 
Identities - 66/328 {20%), Positives * 170/328 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + +P+ + +P+E + 

Sbjct: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

P++ +P E + ++ +E +++P + + GP++ ++P+E + +P+E + +P+ 
Sbjct: 1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E P+ + P+E + +P+E +P++ + P+E + P++ + +P + 

Sbjct: 1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

p+E + ++P+E + ++P+E ++P+E + +P + + P+E +P+E ++P+ 
Sbjct: 1239 YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPT 1298 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRIS 797 

E++ ++P+E + + P+ ++P E + ++ E + +P E + E S +S 
Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKP 1358 

Query: 798 ERSHSPSEKSHLSPLERSRCSPSE 821 

E + P+++ P + P++ 
Sbjct: 1359 CNTEEFTDEPTDEPTDEPSDEPTDEPTD 1386 

Score = 333 (50.0 bits), Expect 5.7e-26, P = 5.7e-26 
Identities = 63/320 (19%), Positives » 166/320 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + P+ + +P+E + 

Sbjct: 1075 EETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1134 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++P+E + P+ - + +P+ 
Sbjct: 1135 YAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPT 1194 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E P++ + P+E + P++ + +P + 

Sbjct: 1195 EETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETT 1254 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P + + P+E + +P+E + +P+E +P+E + P+ 
Sbjct: 1255 YAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1314 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRISERSH 801 



974 



WO 01/12659 



PCT/IB00/01496 



++ ++P+E + ++P+E ++P+E + + ES+S+ +E+ E+ 
Sbjct: 1315 GETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKPCNTEEFTDEPTDEPTD 1374 

Query: 802 SPSEKSHLSPLERSRCSPSE 821 

PS + + P + P++ 
Sbjct: 1375 EPSDEPTDEPTDEPTDLPTD 1394 

Score » 303 (45.5 bits), Expect = 9.6e-23, P = 9.6e-23 
Identities = 70/322 (21%), Positives - 170/322 (52%) 



Query: 


584 GLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCS 


643 




G + PS + P++ + P E + +PSE + +P E + P+++ + E ++ + 




Sbjct: 


299 


GGYEPSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY-DVEETTYVT 




Query: 


644 


PSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRS HRSPSERSHHSPSERSHHSPSER 


703 




E +P++ P+ER H++ ++ + + +P+E + ++P+E + ++P+E 




Sbjct: 


357 


--EESTYAPTKSETNAPTERMHYAHIEKPCDTEV — TMYAPTEETTYAPTEETTYAPTEE 


412 


Query: 


704 


RHHSPSERSKCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHS 


763 




++P+E + P+E + +P+E +P+E ++P+EK+ ++P+E + ++P+E + 




Sbjct: 


413 


TTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYE 


472 


Query: 


764 


PLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSER 


822 




P E + ++ + + + P+E ++ S E + + E +++P+E++ P E + + P+E 




Sbjct: 


473 


PTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 


532 


Query: 


823 


RGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCS 


882 




++ + +T ++P+E + +P+ +E + E T ++P+E P+ + 




Sbjct: 


533 


TTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYA 


592 


Query: 


883 


PSERSRRSPLKEGLKYSFPGERP 905 






P E + + P +E Y+ E P 




Sbjct: 


593 


PIEETTYAPTEE-TTYAPAEETP 614 




Score 


- 151 


(22.7 bits), Expect » 2.0e-06, P = 2.0e-06 




Identities = 45/198 (22%), Positives « 103/198 (52%) 




Query: 


716 


PSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLER 


775 




PS+ + +P+E P E + PSE + ++P E + ++P+E+ +E + + + E 




Sbjct: 


303 


PSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPYD— VEETTY-VTEE 


358 


Query: 


776 


SHRSPSERRSHRSFERSHRRISERS HSPSEKSHLSPLERSRCSPSERRGHSSS 


828 




s +p++ ++ ER H E+ ++P+E++ +P E + +P+E ++ + 




Sbjct: 


359 


STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 


418 


Query: 


829 GKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSR 


888 






+T + P+E + +P+ +E + + E+T ++P+E P+ P+E + 




Sbjct: 


419 


EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 


478 


Query: 


889 


RSPLKEGLKYS FPGERPSHSLSRD 912 








+P KE Y+ P E +++ + + 




Sbjct: 


479 


YAPTKE-TTYA-PTEETTYASTEE 500 






Pedant information for DKFZphtes3_8gll, frame 2 





Report for DKFZphtes3_8gll . 2 



[LENGTH J 954 

[MW] 110063.05 

(pi] 11.40 

(PROSITEJ ATP_GTP_A 1 

(KW] irregular 

(KWJ LOW_COMPLEXITY 27.67 % 



SEQ ESSLSIFYDREDLVPMEESEDSQSDSQTRISESQHSLKPNYLSQAKTDFSEQFQLLEDLQ 

SEG xxxxxxxxxxx 

PRO ccceeeccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ LKIAAKLLRSQIPPDVPPPLASGLVLKYPICLQCGRCSGLNCHHKLQTTSGPYLLIYPQL 

SEG 

PRD hhhhhhhhhhcccccccccccceeeeecceeecccccccccccccccccccceeeehhhh 

SEQ HLVRTPEGHGEVRLHLGFRLRIGKRSQISKYRERDRPVIRRSPISPSQRKAKIYTQASKS 

SEG 

PRD hcccccccccceeecccceeeccccccccccccccceeeeeccccccchhhhhhhccccc 

SEQ PTSTIDLQSGPSQSPAPVQVYIRRGQRSRPDLVEKTKTRAPGHYEFTQVHNLPESDSEST 
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PRD ccccccccccccccccceeeeeeeccccccchhhhhhcccccceeeeeecccccccccch 

SEQ QNEKRAKVRTKKTSOSKYPMKRITKRLRKHRKFYTNSRTTIESPSRELAAHLRRKRIGAT 

SEG 

PRD hhhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhcc 

SEQ QTSTASLKRQPKKPSQPKFMQLLFQSLKRAFQTAHRVIASVGRKPVDGTRPDNLWASKNY 

SEG 

PRD ccchhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ YPKQNARDYCLPSSIKRDKRSADKLTPAGSTIKQEDILWGGTVQCRSAQQPRRAYSFQPR 

SEG 

PRD cccccccccccccccccccccccccccccccccccceeeccccccccccccccccccccc 

SEQ PLRLPKPTDSQSGIAFQTASVGQPLRTVQKDSSSRSKKNFYRNETSSQESKNLSTPGTRV 

SEG 

PRD ccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccccccee 

SEQ QARGRILPGSPVKRTWHRHLKDKLTHKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPS 

SEG xxxxx 

PRD eeecccccccccccccccccccccccccccccceeeeccccccccccccccccccccccc 

SEQ ERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGL 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD chhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERSRHSLLERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ SSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSE 

SEG xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ RSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTTLLGTTHKNPKAGQVWRPEATR 

SEG 

PRD ccccccccccceeecccccccccccccccccccccccccccccccccccccccc 

Prosite for DKF2phtes3_8gll . 2 

PS00017 839->847 ATP_GTP_A PDOC00017 

(No Pfam data available for DKFZphtes3_8gll . 2) 
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DKF2phtes3_8g5 



group: testes derived 

DKFZphtes3_8g5 encodes a novel 544 amino acid protein nearly identical to human KIAA087 
protein. 

The novel protein is a new splice variant of KIAA087. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



KIAA087, alternative spliced 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2762 bp 

No poly A stretch found, no polyadenylation signal found 



1 CCGACATCGG CCGTGTCTCC AGCACCTGCC GGCGGCTGCG CGAGCTGTGC 
51 CAGAGCAGCG GGAAGGTGTG GAAGGAGCAG TTCCGGGTGA GGTGACCTTC 
101 CCTTATGAAA CACTACAGCC CCACCGACTA CGTCAATTGG TTGGAAGAGT 
151 ATAAAGTTCG GCAAAAAGCT GGGTTAGAAG CGCGGAAGAT TGTAGCCTCG 
201 TTCTCAAAGA GGTTCTTTTC AGAGCACGTT CCTTGTAATG GCTTCAGTGA 
251 CATTGAGAAC CTTGAAGGAC CAGAGATTTT TTTTGAGGAT GAACTGGTGT 
301 GTATCCTAAA TATGGAAGGA AGAAAAGCTT TGACCTGGAA ATACTACGCA 
351 AAAAAAATTC TTTACTACCT GCGGCAACAG AAGATCTTAA ATAATCTTAA 
401 GGCCTTTCTT CAGCAGCCAG ATGACTATGA GTCGTATCTT GAAGGTGCTG 
451 T AT AT ATT G A CCAGTACTGC AATCCTCTCT CCGACATCAG CCTCAAAGAC 
501 ATCCAGGCCC AAATTGACAG CATCGTGGAG CTTGTTTGCA AAACCCTTCG 
551 GGGCATAAAC AGTCGCCACC CCAGCTTGGC CTTCAAGGCA GGTGAATCAT 
601 CCATGATAAT GGAAATAGAA CTCCAGAGCC AGGTGCTGGA TGCCATGAAC 
651 TATGTCCTTT ACGACCAACT GAAGTTCAAG GGGAATCGAA TGG ATT ACTA 
701 TAATGCCCTC AACTTATATA TGCATCAGGT TTTGATTCGC AGAACAGGAA 
7 51 TCCCAATCAG CATGTCTCTG CTCTATTTGA CAATTGCTCG GCAGTTGGGA 
801 GTCCCACTGG AGCCTGTCAA CTTCCCAAGT CACTTCTTAT TAAGGTGGTG 
851 CCAAGGCGCA GAAGGGGCGA CCCTGGACAT CTTTGACTAC ATCTACATAG 
901 ATGCTTTTGG GAAAGGCAAG CAGCTGACAG TGAAAGAATG CGAGTACTTG 
951 ATCGGCCAGC ACGTGACTGC AGCACTGTAT GGGGTGGTCA ATGTCAAGAA 
1001 GGTGTTACAG AGAATGGTGG GAAACCTGTT AAGCCTGGGG AAGCGGGAAG 
1051 GCATCGACCA GTCATACCAG CTCCTGAGAG ACTCGCTGGA TCTCTATCTG 
1101 GCAATGTACC CGGACCAGGT GCAGCTTCTC CTCCTCCAAG CCAGGCTTTA 
1151 CTTCCACCTG GGAATCTGGC CAGAGAAGTC TTTCTGTCTT GTTTTGAAGG 
1201 TGCTTGACAT CCTCCAGCAC ATCCAAACCC TAGACCCGGG GCAGCACGGG 
1251 GCGGTGGGCT ACCTGGTGCA GCACACTCTA GAGCACATTG AGCGCAAAAA 
1301 GGAGGAGGTG GGCGTAGAGG TGAAGCTGCG CTCCGATGAG AAGCACAGAG 
1351 ATGTCTGCTA CTCCATCGGG CTCATTATGA AGCATAAGAG GTATGGCTAT 
1401 AACTGTGTGA TCTACGGCTG GGACCCCACC TGCATGATGG GACACGAGTG 
14 51 GATCCGGAAC ATGAACGTCC ACAGCCTGCC GCACGGCCAC CACCAGCCTT 
1501 TCTATAACGT GCTGGTGGAG GACGGCTCCT GTCGATACGC AGCCCAAGAA 
1551 AACTTGGAAT ATAACGTGGA GCCTCAAGAA ATCTCACACC CTGACGTGGG 
1601 ACGCTATTTC TCAGAGTTTA CTGGCACTCA CTACATCCCA AACGCAGAGC 
1651 TGGAGATCCG GTATCCAGAA GATCTGGAGT TTGTCTATGA AACGGTGCAG 
1701 AATATTTACA GTGCAAAGAA AGAGAACATA GATGAGTAAA GTCTAGAGAG 
1751 GACATTGCAC CTTTGCTGCT GCTGCTATCT TCCAAGAGAA CGGGACTCCG 
1801 GAAGAAGACG TCTCCACGGA GCCCTCGGGA CCTGCTGCAC CAGGAAAGCC 
1851 ACTCCACCAG TAGTGCTGGT TGCCTCCTAC TAAGTTTAAA TACCGTGTGC 
1901 TCTTCCCCAG CTGCAAAGAC AATGTTGCTC TCCGCCTACA CTAGTGAATT 
1951 AATCTGAAAG GCACTGTGTC AGTGGCATGG CTTGTATGCT TGTCCTGTGG 
2001 TGACAGTTTG TGACATTCTG TCTTCATGAG GTCTCACAGT CGACGCTCCT 
2051 GTAATCATTC TTTGTATTCA CTCCATTCCC CTGTCTGTCT GCATTTGTCT 
2101 CAGAACATTT CCTTGGCTGG ACAGATGGGG TTATGCATTT GCAATAATTT 
2151 CCTTCTGATT TCTCTGTGGA ACGTGTTCGG TCCCGAGTGA GGACTGTGTG 
2201 TCTTTTTACC CTGAAGTTAG TTGCATATTC AGAGGTAAAG TTGTGTGCTA 
2251 TCTTGGCAGC ATCTTAGAGA TGGAGACATT AACAAGCTAA TGGTAATTAG 
2301 AATCATTTGA ATTTATTTTT TTCTAATATG TGAAACACAG ATTTCAAGTG 
2351 TTTTATCTTT TTTTTTTTTA AATTTAAATG GGAATATAAC ACAGTTTTCC 
2401 CTTCCATATT CCTCTCTTGA GTTTATGCAC ATCTCTATAA ATCATTAGTT 
2451 TTCTATTTTA TTACATAAAA TTCTTTTAGA AAATGCAAAT AGTGAACTTT 
2501 GTGAATGGAT TTTTCCATAC TCATCTACAA TTCCTCCATT TTAAATGACT 
2551 ACTTTTATTT TTTAATTTAA AAAATCTACT TCAGTATCAT GAGTAGGTCT 
2601 TACATCAGTG ATGGGTTCTT TTTGTAGTGA GACATACAAA TCTGATGTTA 
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2651 ATGTTTGCTC TTAGAAGTCA TACTCCATGG TCTTCAAAGA CCAAAAAATG 
2701 AGGTTTTGCT TTTGTAATCA GGAAAAAAAA AATTAATGAA CCTTAAAAAA 
2751 AAAAAAAAAA GG 



BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 3 



ORF from 105 bp to 1736 bp; peptide length: 544 
Category: known protein 
Classification: unclassified 



1 MKHYSPTDYV NWLEEYKVRQ KAGLEARKIV ASFSKRFFSE HVPCNGFSDI 
51 ENLEGPEIFF EDELVCILNM EGRKALTWKY YAKKILYYLR QQKILNNLKA 
101 FLQQPDDYES YLEGAVYIDQ YCNPLSDISL KDIQAQIDSI VELVCKTLRG 
151 INSRHPSLAF KAGESSMIME I ELQSQVLDA MNYVLYDQLK FKGNRMDYYN 
201 ALNLYMHQVL IRRTGIPISM SLLYLTIARQ LGVPLEPVNF PSHFLLRWCQ 
251 GAEGATLDIF DYIYIDAFGK GKQLTVKECE YLIGQHVTAA LYGVVNVKKV 
301 LQRMVGNLLS LGKREGIDQS YQLLRDSLDL YLAMYPDQVQ LLLLQARLYF 
351 HLGIWPEKSF CLVLKVLDIL QHIQTLDPGQ HGAVGYLVQH TLEHIERKKE 
401 EVGVEVKLRS DEKHRDVCYS IGLIMKHKRY GYNCVI YGWD PTCMMGHEWI 
451 RNMNVHSLPH GHHQPFYNVL VEDGSCRYAA QENLEYNVEP QEISHPDVGR 
501 YFSEFTGTHY IPNAELEIRY PEDLEFVYET VQNIYSAKKE NIDE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8g5, frame 3 

TREMBLNEW:AB020682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; 
Homo sapiens mRNA for KIAA0875 protein, partial cds., N • 1, Score - 
2832, P = 5.5e-295 

>TREMBLNEW : ABO 20 682 1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo 
sapiens mRNA for KIAA0875 protein, partial cds. 
Length 621 

HSPs: 

Score = 2832 (424.9 bits), Expect = 5.5e-295, P = 5.5e-295 
Identities = 537/544 (98%), Positives =» 537/544 (98%) 

MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDI ENLEGPEIFF i 
MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDIENLEGPEIFF 
MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDI ENLEGPEIFF 3 

E DE L VC I LNMEGRKALTW K YYAKKILYYLRQQK I LNNLKAFLQQPDD YES YLEGAVYIDQ 1 
EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYES YLEGAVYIDQ 
EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYES YLEGAVYIDQ 2 

YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 3 
YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 
YCNPLSDISLKDIQAQIDS I VELVCKTLRGINSRHPSLAFKAGESSMIMEI ELQSQVLDA t 

MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPI5MSLLYLTIARQLGVPLEPVNF \ 
MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 
MNYVLYDQLKFKGNRMDY YNALNLYMHQVLI RRTGI PI SMSLLYLTI ARQLGVPLEPVNF '. 

PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGWNVKKV ' 
PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGWNVKKV 
PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV ; 

LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEKSF '. 
LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK 



Query: 


1 


Sbjct: 


85 


Query: 


61 


Sbjct: 


145 


Query: 


121 


Sbjct: 


205 


Query: 


181 


Sbjct: 


265 


Query: 


241 


Sbjct: 


325 


Query: 


301 
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Sbjct: 385 LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK-- 442 

Query: 361 CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 420 

VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 
Sbjct: 443 VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 497 

Query: 421 IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 480 

IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 
Sbjct: 498 IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 557 

Query: 481 QENLEYNVEPQEISHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFVYETVQNIYSAKKE 540 

QENLEYNVEPQEISHPDVGRYFSEFTGTHY I PNAELEIRYPEDLEFVYETVQNIYSAKKE 
Sbjct: 558 QENLEYNVEPQEISHPDVGRYFSEFTGTHYI PNAELEIRYPEDLEFVYETVQNIYSAKKE 617 

Query: 541 NIDE 544 
NIDE 

Sbjct: 618 NIDE 621 

Pedant information for DKFZphtes3_8g5, frame 3 



Report for DKFZphtes3_8g5 . 3 

[ LENGTH ] 54 4 

[MW] 63307.22 

CPU 5.82 

[HOMOL] TREMBL: AB020632_1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo sapiens 

mRNA for KIAA0875 protein, partial cds. 0.0 

(KW) Alpha_Beta 

(KW) LOW_COMPLEXITY 1.84 % 

SEQ MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDIENLEGPEIFF 

SEG 

PRD cccccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhcccccccccccccccccceee 

SEQ EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 

SEG 

PRD eeeeeeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeecceeeeeee 

SEQ YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhcccccccccceeeecccchhhhhhhhhhhhhhh 

SEQ MNYVLYDQLKFKGNRMDYYNALNLYMKQVLIRRTGI PISMSLLYLTIARQLGVPLEPVNF 

SEG 

PRD hhhhhccccccccccchhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccccc 

S EQ PSH FLLRWCQG AEGATLD I FDY I YI DAFGKGKQLTVKECE YL I GQHVTAALYG WNVKKV 

SEG 

PRD cceeeeeeccccccceeeeeeeeeeeccccceeeeeehhhhhhhhhhhhhhhhhhhhhhh 

SEQ LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEKSF 

SEG 

PRD hhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhhhhhhhhhhhhcccccceee 

SEQ CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 

SEG xxxxxxxxxx 

PRD ehhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhheeeeecccccceeeecc 

SEQ ■ IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 

SEG 

PRD cccchhhhhhhceeeeecccccccchhhhhhhhhhhccccccccccceeeeecccceeee 

SEQ QEN LEYNVEPQE I SHPDVGRYFSEFTGTHYI PNAELEIRYPEDLEFVYETVQNIYSAKKE 

SEG : 

PRD hhhhhhhhcccccccccceeeeccccccccccchhhhhhccchhhhhhhhhhhhhccccc 

SEQ NIDE 

SEG 

PRD cccc 

(No Prosite data available for DKFZphtes3_Bg5 . 3) 

(No Pfam data available for DKFZphtes3_8g5 . 3) 
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DKFZphtes3_8mlO 



group: nucleic acid management 

DKFZphtes3_8mlO encodes a novel 221 amino acid protein with strong similarity 
polyadenylate-binding proteins. 



stability and the translation of mRNA. 

The new protein can find application in modulation of mRNA translation and 
processing/stability. 



strong similarity to polyadenylate-binding protein 

frame shift at Bp 707-710 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2107 bp 

Poly A stretch at pos. 2052, polyadenylation signal at pos. 



1 CGGAAAGGTC GCGGCTTGTG TGCCTGCGGG CAGCCGTGCC GAGAATGAAC 
51 CCCAGCACCC CCAGCTACCC AACGGCCTCG CTCTACGTGG GGGACCTCCA 
101 CCCCGACGTG ACTGAGGCGA TGCTCTACGA GAAGTTCAGC CCGGCAGGGC 
151 CCATCCTCTC CATCCGGATC TGCAGGGACT TGATCACCAG CGGCTCCTCC 
201 AACTACGCGT ATGTGAACTT CCAGCATACG AAGGACGCGG AGCATGCTCT 
251 GGACACCATG AATTTTGATG TTATAAAGGG CAAGCCAGTA CGCATCATGT 
301 GGTCTCAGCG TGATCCATCA CTTCGAAAAA GTGGAGTGGG CAACATATTC 
351 GTTAAAAATC TGGATAAGTC CATTAATAAT AAAGCACTGT ATGATACAGT 
401 TTCTGCTTTT GGTAACATCC TTTCGTGTAA CGTGGTTTGT GATGAAAATG 
4 51 GTTCCAAGGG TTATGGATTT GTACACTTTG AGACACACGA AGCAGCTGAA 
501 AGAGCTATTA AAAAAATGAA CGGAATGCTC CTAAATGGTC GCAAAGTATT 
551 TGTTGGACAA TTTAAGTCTC GTAAAGAACG AGAAGCTGAA CTTGGAGCTA 
601 GGGCAAAAGA GTTCCCCAAT GTTTACATCA AGAATTTTGG AGAAGACATG 
651 GATGATGAGC GCCTTAAGGA TCTCTTTGGC AAGTTCGGGC CCGCCTTAAG 
701 TGTGAATTAA TGACCGATGA AAGTGGAAAA TCCAAAGGAT TTGGATTTGT 
751 AAGCTTTGAA AGGCATGAAG ATGCACAGAA AGCTGTAGAT GAGATGAATG 
801 GAAAGGAGCT CAATGGAAAA CAAATTTACG TTGGTCGAGC TCAGAAAAAA 
851 GTGGAACGGC AGACGGAACT TAAGCGCACA TTTGAACAGA TGAAGCAAGA 
901 TAGGATCACC AGATACCAGG TTGTTAATCT TTATGTGAAA AATCTTGATG 
951 ATGGTATTGA TGATGAACGT CTCCGGAAAG CGTTTTCTCC ATTTGGTACA 
1001 ATCACTAGTG CAAAGGTTAT GATGGAAGGT GGTCGCAGCA AAGGGTTTGG 
1051 TTTTGTATGT TTCTCCTCCC CAGAAGAAGC CACTAAAGCA GTTACAGAAA 
1101 TGAACGGTAG AATTGTGGCC ACAAAGCCAT TGTATGTAGC TTTAGCTCAG 
1151 CGCAAAGAAG AGCGCCAGGC TTACCTCACT AACGAGTATA TGCAGAGAAT 
1201 GGCAAGTGTA CGAGCTGTGC CCAACCAGCG AGCACCTCCT TCAGGTTACT 
1251 TCATGACAGC TGTCCCACAG ACTCAGAACC ATGCTGCATA CTATCCTCCT 
1301 AGCCAAATTG CTCGACTAAG ACCAAGTCCT CGCTGGACTG CTCAGGGTGC 
1351 CAGACCTCAT CCATTCCAAA ATAAGCCCAG TGCTATCCGC CCAGGTGCTC 
1401 CTAGAGTACC ATTTAGTACT ATGAGACCAG CTTCTTCACA GGTTCCACGA 
14 51 GTCATGTCAA CGCAGCGTGT TGCTAACACA TCAACACAGA CAGTGGGTCC 
1501 ACGTCCTGCA GCTGCTGCTG CTGCTGCAGC TACCCCTGCT GTGCGCACGG 
1551 TTCCACGGTA TAAATATGCT GCGGGAGTTC GCAATCCTCA GCAACATCGT 
1601 AATGCACAGC CACAAGTTAC AATGCAACAG CTTGCTGTTC ATGTACAAGG 
1651 TCAGGAAACT TTGACTGCCT CCAGGTTGGC ATCTGCCCCT CCTCAAAAGC 
1701 AAAAGCAAAT GTTAGGTGAA CGGCTCTTTC CTCTTATTCA AGCCATGCAC 
1751 CCTACTCTTG CTGGGAAAAT CACTGGCATG TTGTTGGAGA TTGATAATTC 
1801 AGAACTTCTT TATATGCTCG AGTCTCCAGA GTCACTCCGT TCTAAGGTTG 
1851 ATGAAGCTGT AGCTGTACTA CAAGCCCACC AAGCTAAAGA GGCTACCCAG 
1901 AAAGCAGTTA ACAGTGCTAC CGGTGTTCCA ACTGTTTAAA ATTGATCAGA 
1951 GACCACGAAA AGAAATTTGT GCTTCACCGA AGAAAAATAT CTAAACATCG 
2001 AGAAACTATG GGAAAAAAAA TTGCAAAATC TAAAATAAAA AATGCAAAAT 
2051 CTAAAATAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2101 AAAAAGG 




>r (mRNA) 3 , -poly(A) tail found oi 
been implicated in governing the 



BLAST Results 



Entry HSPOLYAB from database EMBL: 

Human mRNA for polyA binding protein 

Score » 5420, P = 0.0e+00, identities » 1162/1243 
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Medline entries 

No Medline entry 

Peptide information for frame 2 



ORF from 707 bp to 1936 bp; peptide length: 410 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP 1 (10-18) 
RNP 1 (112-120) 



1 LMTDESGKSK GFGFVSFERH EDAQKAVDEM NGKELNGKQI YVGRAQKKVE 
51 RQTELKRTFE QMKQDRITRY QVVNLYVKNL DDGIDDERLR KAFSPFGTIT 
101 SAKVMMEGGR SKGFGFVCFS SPEEATKAVT EMNGRIVATK PLYVALAQRK 
151 EERQAYLTNE YMQRMASVRA VPNQRAPPSG YFMTAVPQTQ NHAAYYPPSQ 
201 IARLRPSPRW TAQGARPHPF QNKPSAIRPG APRVPFSTMR PASSQVPRVM 
251 STQRVANTST QTVGPRPAAA AAAAATPAVR TVPRYKYAAG VRNPQQHRNA 
301 QPQVTMQQLA VHVQGQETLT ASRLASAPPQ KQKQMLGERL FPLIQAMHPT 
351 LAGKITGMLL EIDNSELLYM LESPESLRSK VDEAVAVLQA HQAKEATQKA 
401 VNSATGVPTV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8mlO, frame 2 

PIR:DNHUPA polyadenylate-binding protein - human, M - 1, Score - 1931, 
P = 1.7e-199 

PIR: 148718 poly (A) binding protein - mouse, N = 1, Score « 1928, P - 
3.6e-199 

>PIR:DNHUPA polyadenylate-binding protein - human 
Length = 633 

HSPs: 

Score = 1931 (289.7 bits), Expect = 1.7e-199, P - 1.7e-199 
Identities = 384/415 (92%) , Positives = 394/415 (94%) 

LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 
+MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKR FE 



QMKQDRITRYQ VNLYVKNLDDGIDDERLRK FSPFGTITSAKVMMEGGRSKGFGFVCFS 
QMKQDRITRYQGVNLYVKNLDOGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFS 338 

SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPN Q 174 

SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQA+LTN+YMQRMASVRAVPN Q 



APPSGYFM A+PQTQN AAYYPPSQ+A+LRPSPRWTAQGARPHPFQN P AIRP APR 



Query: 


1 


Sbjct: 


219 


Query: 


61 


Sbjct: 


279 


Query: 


121 


Sbjct: 


339 


Query: 


175 


Sbjct: 


399 


Query: 


235 


Sbjct: 


459 


Query: 


295 


Sbjct: 


518 


Query: 


355 


Sbjct: 


578 


Score 


- 315 



PFSTMRPASSQVPRVMSTQRVANTSTQT+GPRPAAAAAAA TPAVRTVP+YKYAAGVRNP 
PFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAA-TPAVRTVPQYKYAAGVRNP 

QQHRNAQPQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGK 
QQH NAQPQVTMQQ AVHVQGQE LTAS LASAPPQ+QKQMLGERLFPLIQAMKPTLAGK 
QQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGK 

I TGMLLE I DNS ELL YMLES PES LRSKVDEAVAVLQAHQAKEATQKA VNSATGVPTV 410 
ITGMLLEIDNSELL+MLESPESLRSKVDEAVAVLQAHQAKEA QKA VNSATGVPTV 
ITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV 633 

315 (47.3 bits), Expect = 1.9e-27, P = 1.9e-27 
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Identities - "71/163 (43%), Positives - 102/163 (62%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



1 LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 60 
++ DE+G SKG+GFV FE E A++A+++MNG LN ++++VGR + + ER+ EL + 
130 VVCDENG-SKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAK 188 

61 QMKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMM«EGGRSKGFGFVCF 119 
+ " N+Y+KN + +DDERL+ F P S KVM E G+SKGFGFV F 

189 EF TNVYIKNFGEDMDDERLKDLFGP ALSVKVMTDESGKSKGFGFVSF 235 

120 SSPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQ 163 

E+A KAV EMNG+ + K +YV AQ+K ERQ L ++ Q 
236 ERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFEQ 2*79 



Score = 214 (32.1 bits), Expect - 1.9e-14, P - 1.9e-14 
Identities - 50/150 (33%) , v Positives - 87/150 (58%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



8 KSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQMKQDRI 67 
+S G+ +V+F++ DA++A+D MN + GK + + +Q R L+++ 
50 RSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQ RDPSLRKS 



96 



68 TRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATK 127 

V N+++KNLD ID++ L FS FG I S KV+ + SKG+GFV F + E A + 
97 GVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKWCDENGSKGYGFVHFETQEAAER 153 

128 AVTEMNGRIVATKPLYVALAQRKEERQAYL 157 

A+ +MNG ++ + ++V + ++ER+A L 
154 AIEKMNGMLLNDRKVFVGRFKSRKEREAEL 183 



Score = 120 (18.0 bits). Expect = 4.8e-04, P - 4.8e-04 
Identities - 30/99 (30%), Positives = 54/99 (54%) 

Query: 70 YQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVM— MEGGRSKGFGFVCFSSPEEATK 127 

Y + +LYV +L + + L + FSP G I S +V M RS G+ +V F P +A + 
Sbjct: 8 YPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQQPADAER 67 

Query: 128 AVTEMNGRIVATKPLYVALAQRKEE-RQAYLTNEYMQRM 165 

A+ MN ++ K.P+ + +QR R++ + N +++ + 
Sbjct: 68 ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNL 106 



Peptide information for frame 3 



ORF from 45 bp to 707 bp; peptide length: 221 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (138-146) 



1 MNPSTPSYPT ASLYVGDLHP DVTEAMLYEK FSPAGPILSI RICRDLITSG 

51 SSNYAYVNFQ HTKDAEHALD TMNFDVIKGK PVRIMWSQRD PSLRKSGVGN 

101 IFVKNLDKSI NNKALYDTVS AFGNILSCNV VCDENGSKGY GFVHFETHEA 

151 AERAIKKMNG MLLNGRKVFV GQFKSRKERE AELGARAKEF PNVYIKNFGE 

201 DMDDERLKDL FGKFGPALSV N 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8ml0, frame 3 

SWISS PROT : PAB1_HTJMAN POLYADENYLATE- BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1)., N = 1, Score - 1039, P * 5.7e-105 

PIR: 148718 poly (A) binding protein - mouse, N » 1, Score = 1031, P - 
4e-104 

PIR:DNHOPA polyadenylate-binding protein - human, N = 1, Score 1009, 
P - 8.7e-102 



>SWISSPROT:PABl_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1) . 
Length - 636 

HSPs: 



982 



WO 01/12659 
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Score =• 1039 {155.9 bits), Expect « 5.7e-105, P = 5.7e-105 
Identities = 199/220 (90%), Positives = 205/220 (93%) 

Query 1 mnpstpsyptaslyvgdlhpdvteamlyekfspagpilsiricrdlitsgssnyayvnfq 60 

MNPS PSYP ASLYVGDLHPDVTEAMLYEKFSPAGPILSIR+CRD+IT S YAYVNFQ 
Sbjct: 1 MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQ 60 

Query: 61 HTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNLDKSINNKALYDTVS 120 

DAE ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIF+KNLDKSI+NKALYDT S 
Sbjct: 61 QPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFS 120 

Query: 121 AFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSRKERE 180 

AFGNILSC VVC DENGS KG YGFVHFET EAAERAI +KMNGMLLN RKVFVG+FKSRKERE 
Sbjct: 121 AFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKERE 180 

Query: 181 AELGARAKEFPNVY I KN FGEDMDDERLKDLFGKFGPALS V 220 

AELGARAKEF NVYIKNFGEDMDDERLKDLFGKFGPALSV 
Sbjct: 181 AELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 

Score - 275 (41.3 bits), Expect - 4.1e-23, P » 4.1e-23 
Identities = 71/233 (30%), Positives - 120/233 (51%) 



Query: 


2 


Sbjct: 


90 


Query: 


62 


Sbjct: 


150 


Query: 


118 


Sbjct: 


210 


Query: 


177 


Sbjct: 


270 


Score 


- 227 



+PS 



++++ +L 



LY+ FS G ILS ++ D 



+ A 



++ M 



+R+ L R 



N+++KN + ++++ L D 



FG LS V+ DE+G SKG+GFV FE HE A++A+ +MNG LNG++++VG+ + + 



ER+ EL + ++ 



-NVY I KN FGEDMDDERLKDLFGKFGPALS 219 
N+Y+KN + +DDERL+ F FG S 



.1 bits), Expect « 6.3e-13, P = 6.3e-18 
Identities - 57/187 (30%), Positives = 101/187 (54%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



12 SLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQHTKDAEHALDT 71 
++Y+ + D+ + L + F GP LS+++ D + S + +V+F+ +DA+ A+D 
192 NVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDE-SGKSKGFGFVSFERHEDAQKAVDE 250 

72 KNFDVI KGKPVRIMWSQR DPSLRKSGVGN I FVKNLDKS I NNKA 114 

MN + GK + + +Q+ D R GV N++VKNLD I+++ 

251 MNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGV-NLYVKNLDDGIDDER 309 

115 LYDTVSAFGNILSCNWCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFK 174 

L S FG I S V+ + SKG+GFV F + E A +A+ +MNG ++ + ++V + 
310 LRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQ 369 



175 SRKEREAEL 183 
++ER+A L 
Sbjct: 370 RKEERQAHL 378 

Score - 100 (15.0 bits), Expect - 2.3e-02, P - 2.3e-02 
Identities - 26/99 (26%), Positives - 53/99 (53%) 

Query: 8 YPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSG-SSNYAYVNFQHTKDAE 66 

Y +LYV +L + + L ++FSP G I S ++ ++ G S + +V F ++A 
Sbjct: 291 YQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKV MMEGGRSKGFGFVCFSSPEEAT 347 

Query: 67 HALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 106 

A+ MN ++ KP+ + +QR R++ + N +++ + 
Sbjct: 348 KAVTEMNGRIVATKPLYVALAQRKEE-RQAHLTNQYWQRM 386 



Pedant information for DKFZphtes3_8ml0, frame 2 



Report for DKFZphtes3_8ml0 . 2 



(LENGTH) 409 

IMW] 45235.68 

(pi) 10.08 

(HOMOL) SWISSPROT:PABl_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 

1) (PABP 1) . 0.0 
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le 



[ FUNCAT) 
cerevisiae, 
I FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YER165w) 
I FUNCAT) 
le-15 
[FUNCAT] 
[ FUNCAT } 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT) 
[FUNCAT] 
[ FUNCAT) 
[ FUNCAT] 
[FUNCAT] . 
[ FUNCAT ] 
2e-05 
I FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[BLOCKS] 
[SCOP] 
(PIRKW) 
(PIRKW) 
[PIRKW] 
[ PIRKW) 
( PIRKW] 
t PIRKW) 
[PIRKW) 
[PIRKW) 
[PIRKW] 
[PIRKW] 
[PIRKW] 
(PIRKW) 
[PIRKW] 
[PIRKW} 
[SUPFAM] 
(SUPFAM) 
[SUPFAM) 
(SUPFAM) 
[SUPFAM] 
[PROSITE] 
[PFAM] 
(KW) 
[KW] 
[KWJ 



04.05.05 mrna processing (5' -end, 3' -end processing and mrna degradation) [S. 
YER165W] le-54 

30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-54 
30.10 nuclear organization [S. cerevisiae, YERl65w] le-54 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

•54 

04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w) 

11.01 stress response (S. cerevisiae, YGR159C) le-12 

04.01.04 rrna processing [S. cerevisiae, YGR1590 le-12 

04.99 other transcription activities [S. cerevisiae, YNL175c] 4e-09 

98 classification not yet clear-cut (S. cerevisiae, YPR112cJ 5e-08 
03.19 recombination and dna repair [S. cerevisiae, YHR086w] 3e-07 
03.13 meiosis (S. cerevisiae, YHR086w] 3e-07 

04.05.03 mrna processing (splicing) [S. cerevisiae, YHR086w] 3e-07 

04.07 rna transport [S. cerevisiae, YOL123w HRPl - CF lb) 9e-07 

30.13 organization of chromosome structure (S. cerevisiae, YCLOllc] 3e-06 

99 unclassified proteins (S. cerevisiae, YGR250c) 8e-06 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YDR432w] 

08.01 nuclear transport [S. cerevisiae, YDR432w] 2e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

(S. cerevisiae, YFR023w] 3e-05 

03.01 cell growth IS. cerevisiae, YBR212w) 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

dlsxl 4.34.7.1.3 Sex-lethal protein [ (Drosophila melanogaster ) le-17 

nucleus 0.0 
duplication 0.0 
RNA binding 0.0 
nucleolus 2e-09 
tandem repeat 2e-09 
single-stranded DNA binding 3e-06 
" DNA binding 5e-13 
phosphoprotein 6e-10 
ribosome 3e-08 
mitochondrion 3e-08 
alternative splicing 9e-ll 
chloroplast 2e-19 
transcription regulation 2e-07 
protein biosynthesis 3e-08 
nucleolin 6e-10 

glycine-rich RNA-binding protein 2e-07 

unassigned ribonucleoprotein repeat-containing proteins 2e-19 
polyadenylate-binding protein 0.0 
ribonucleoprotein repeat homology 0.0 
RNP_1 2 

■RNA recognition motif, (aka RRM, RBD, or RNP domain} 

Irregular 

3D 

LOW COMPLEXITY 5 . 62 % 



SEQ MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQ 

SEG 

lsxl- 

SEQ MKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSS 

SEG , 

lsxl- CEEEECCCTTTTHHHHHHHHTTTTCCCCCEEECTTTCTTTEEEECTTT 

SEQ PEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPNQRAPPSGY 

SEG 

lsxl- HHHHHHHHHHHTTTCCCCCCCBCCBCC 

SEQ FMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRVPFSTMRP 

SEG 

lsxl- 

SEQ ASSQVPRVMSTQRVANTSTGTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNPQQHRNAQ 

. SEG xxxxxxxxxxxxxxxxxxxxxxx 

lsxl- 

SEQ PQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGKITGMLLE 

SEG 

lsxl- 

SEQ IDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 

SEG 

lsxl- 
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Prosite for DKFZphtes3_8ml0.2 



PS00030 
PS00030 



9->17 
111->119 



RNP_1 
RNP 1 



PDOC00030 
PDOC00030 



Pfam for DKFZphtes3_8ml0 . 2 



HMMJJAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM * I YVGNLPW DtTEEDLr Dl FsQFGpI V3 1 rMMr DReTGRSRGFAFVEFED 

+YV+NL+ +++E LR +FS+FG I+S+++M+ E GRS+GF+FV F + 
Query 74 L YV KN LD DG I D D ERL RK A F S P FGT ITS AKVMM - -EGGRS KG FG F VC FS S 

HMM EEDAekAIdeMNGmeFmGRrlRV* 

+E+A+KA+ EMNG+++ ++++V 
Query 121 PEEATKAVTEMNGRIVATKPLYV 143 



120 



Pedant information for DKFZphtes3_8mlO, frame 3 



Report for DKFZphtes3_8ml0 . 3 



[LENGTH] 
[MW] 
IpU 
[HOMOL] 
1) (PABP 1) 
[FUNCAT] 
cerevisiae, 
[ FUNCAT J 
[ FUNCAT) 
YER165w) le-64 
[ FUNCAT J 
{ FUNCAT ) 
[FUNCAT] 
repair) 
[ FUNCAT J 
2e-19 
[FUNCAT] 
[ FUNCAT I 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-04 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
(PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW) 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 



235 

26308.08 
8.95 

SWISSPROT:PABl_HUMAN POLYADENYLATE-BINDING PROTEIN 1 [POLY (A) BINDING PROTEIN 
le-113 

04.05.05 mrna processing (5 '-end, 3* -end processing and mrna degradation) [S. 
YER165W] le-64 

30.03 organization of cytoplasm (S. cerevisiae, YER165w] le-64 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



30.10 nuclear organization . [S. cerevisiae, YER165w] le-64 

03.19 recombination and dna repair [S. cerevisiae, YFR023w] le-24 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

[S. cerevisiae, YFR023w] le-24 

04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w] 

04.05.03 mrna processing (splicing) [S. cerevisiae, YOR319w) 2e-14 

04.01.04 rrna processing (S. cerevisiae, YGR159c) le-11 
11.01 stress response (S. cerevisiae, YGR159c] le-11 

99 unclassified proteins [S. cerevisiae, YGR250c] le-09 

04.07 rna transport [S. cerevisiae, YOL123w HRP1 - CF lb] le-09 

30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 8e-09 

98 classification not yet clear-cut [S. cerevisiae, YPR112c] 2e-08 

03.13 meiosis [S. cerevisiae, YHR086w) 2e-08 

04.99 other transcription activities [S. cerevisiae, YBR212w] 3e-08 
03.01 cell growth [S. cerevisiae, YBR2l2w] 3e-08 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 

08.01 nuclear transport (S. cerevisiae, YDR432w] 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

BL00900D Bacteriophage-type RNA polymerase family proteins signatur 

dlsxl 4.34.7.1.3 Sex-lethal protein [ (Drosophila melanogaster ) 9e-23 

d2ula 4.34.7.1.2 UlA protein [human (Homo sapiens) 6e-24 

dlupl_2 4.34.7.1.1 Nuclear ribonucleoprotein Al, RNP Al, UP le-13 

nucleus le-110 

duplication le-110 

RNA binding le-110 

nucleolus 4e-10 

tandem repeat 4e-10 

single-stranded DNA binding le-06 

DNA binding 9e-12 

phosphoprotein 4e-10 

mitochondrion 6e-07 

heterotrimer 4e-06 

alternative splicing le-15 

chloroplast 5e-ll 

transcription regulation 3e-09 

GTP binding 2e-06 

helix-destabilizing protein le-07 

nucleolin 4e-10 

glycine-rich RNA-binding protein 2e-07 
yeast HRP1 protein 2e-08 
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[SUPFAM] unassigned ribonucleoprotein repeat-containing proteins 3e-25 

(SUPFAM] polyadenylate-binding protein le-112 

(SUPFAM) ribonucleoprotein repeat homology le-112 

[PROSITE] RNP_1 1 

(PFAM) RNA recognition motif, (aka RRM, RBD, or RNP domain) 

(KW} All_Beta 

(KWJ 3D 



SEQ ERSRLVCLRAAVPRMNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDL 
lhal- EEEETTTTTTCHHHHHHHHGGGCCEEEEEEEETT 



SEQ 

lhal- 



SEQ 
lhal- 



I T S G S S N Y A Y VN FQHT K D AERA L DTMN FDV I KG K PV R I MWS QRDP S LRK SGVGN I FVKN L 
TTTCEEEEEEEEECCHHHHHHHHHHTTEEE-TT— EEEEEEECTTTTCCCCCEEEEECC 



SEQ DKSINNKALYDTVSAFGNILSCNWCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGR 
lhal- TTTTCHHHHHHHHGGGCCEEEEEEEETTTTTCEEEEEEECCHHHHHHHH 



KVFVGQFKSRKEREAELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSVN 



Prosite for DKFZphtes3_8mlO . 3 
PS00030 152->160 RNP_1 PDOC00030 

Pfam for DKFZphtes3_8mlO . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



RNA recognition motif, (aka RRM, RBD, or RNP domain) 

♦IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
+YVG+L + D+TE +L + FS+ GPI+SIR+ RD T S +A+V+F+ 
27 LYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQH 



EEDAe kA I deMNGme FmGRr I RV * 
DAE A+D+MN ++ G+++R+ 
76 TKDAEHALDTMNFDVI KGKPVRI 



98 



115 



♦IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
I+V+NL+ +++ L D S FG I+S++++ D + S+G++FV FE+ 
IFVKNLDKSINNKALYDTVSAFGNILSCNVVCD--ENGSKGYGFVHFET 



EEDAe kAI deMNGme FmGRrlRV* 
+E+AE+AI +MNGM+++GR++ V 
162 HEAAERAXKKMNGMLLNGRKVFV 



184 



75 



161 
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DKFZphtes3_8p7 



group: testes derived 

DKFZphtes3 8p7 encodes a novel 412 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown 

2 EST hits (both from testis librarys) 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2899 bp 

Poly A stretch at pos. 2870, polyadenylation signal at pos. 2852 



1 CCGACCCGCC CTGGGGTGCT GCGTGCGCTG CCTGCTCCCG CCTGAGGAAA 
51 ACACTGCCCA TGGCGCAAGG CCGGGAGCGC GACGAAGGCC CCCACTCCGC 
101 CGGCGGCGCG TCCTTGTCCG TGAGATGGGT GCAAGGATTC CCTAAGCAGA 
151 ATGTTCATTT GTCAACGACA ACACCATTTG CTACCCTTGT GGGAATTATG 
201 TAATATTTAT TAATATTGAA ACCAAGAAAA AGACTGTACT GCAGTGTAGT 
251 AATGGAATTG TGGGCGTCAT GGCAACTAAC ATCCCCTGTG AAGTTGTGGC 
301 TTTTTCTGAC CGGAAGCTAA AACCTCTCAT CTACGTATAC AGCTTTCCAG 
351 GATTGACCAG AAGGACCAAA TTGAAAGGCA ACATTCTCCT GGACTACACT 
401 TTACTTTCAT TCAGTTACTG TGGCACCTAC CTGGCTAGTT ACTCCTCTCT 
4 51 CCCAGAATTT GAACTGGCCC TTTGGAACTG GGAATCGAGT ATCATTTTGT 
501 GTAAGAAATC ACAGCCTGGA ATGGATGTGA ACCAAATGTC TTTTAACCCC 
551 ATGAACTGGC GCCAGCTGTG CTTATCAAGT CCAAGTACAG TGAGCGTGTG 
601 GACCATTGAA AGAAGTAACC AGGAGCATTG TTTCAGAGCA AGGTCGGTGA 
651 AATTACCTCT AGAAGATGGG TCATTTTTTA ATGAAACGGA TGTCGTTTTC 
701 CCCCAGTCGT TGCCGAAAGA TCTCATCTAT GGTCCCGTGC TGCCACTGTC 
751 AGCCATTGCC GGGCTGGTAG GCAAAGAGGC AGAGACTTTC CGGCCGAAAG 
801 ATGATCTATA TCCTTTGCTT CACCCGACTA TGCATTGCTG GACTCCAACA 
851 AGTGACTTGT ACATTGGCTG TGAAGAGGGT CATCTTTTAA TGATTAATGG 
901 AGACACCTTG CAAGTGACTG TACTTAATAA GATAGAAGAG GAATCGCCAT 
951 TGGAAGACAG AAGAAATTTT ATCAGTCCAG TAACCTTGGT ATATCAGAAG 
1001 GAGGGCGTGC TGGCTTCTGG AATTGATGGC TTTGTGTATT CTTTTATTAT 
1051 TAAAGATAGA AGTTACATGA TCGAGGATTT TCTTGAGATT GAAAGACCTG 
1101 TAGAACATAT GACATTTTCT CCCAATTATA CAGTGTTGCT GATTCAAACA 
1151 GACAAGGGAT CTGTTTATAT CTACACTTTT GGTAAGGAGC CAACCTTAAA 
1201 TAAAGTCCTA GATGCTTGTG ATGGGAAATT TCAGGCAATT GACTTTATCA 
1251 CACCTGGAAC CCAATACTTC ATGACACTTA CATATTCAGG GGAAATTTGT 
1301 GTTTGGTGGC TGGAGGATTG TGCTTGTGTA AGCAAGATTT ATCTGAATAC 
1351 CCTAGCAACG GTTCTGGCTT GCTGTCCATC CTCCCTCTCT GCAGCCGTGG 
1401 GCACGGAGGA TGGCTCGGTC TACTTCATCA GCGTATATGA TAAGGAATCC 
14 51 CCTCAGGTCG TGCACAAGGC CTTTCTCTCG GAATCGTCCG TGCAGCACGT 
1501 CGTGTAAGTC CTTTCTGCCT CCAGGAGCGG CTCCGTGTCA CACCCGTCTG 
1551 TTGAAAATTC TAGTGAAGCC ATCCTTTCTT TTAATTTTAA GTTTTACGTG 
1601 TTTCATTTGT TTTGAATGTT AATATATTCA CACAGTTCAA CACTCAAAAG 
1651 GTACAGAGGG CTGTGTAGTA AAGTACCCCC CATACCCAGG TCTGTCCTTG 
1701 CAGGCAGCCT GGTACCAATT TCTCATGTCT CTCCTGAGAT GTTTTATCCA 
1751 TGAACAAGCA AAACATAATA AGCACTTCTT TTTACTTGTA TCAATGGCCA 
1801 TCATGTGTGT ATAGTGTGCC AGGCACTTCT GCTGTATTAA CTCCATGAGG 
1851 TAAACACTCT TGTTGTCTCT ATTTGACAGG TGAGGAAGAT AAGGCACAAG 
1901 GATTTTAAAT AACTTGCTCA ATAGTACACA GATAGTGAAT GGCAAATGTT 
1951 GGGATTTGAA CCCAGGTAGT TGGGCTGCAG AGTCACTGCC TTTGCTCTTA 
2001 AAAGGAGAAA ACTATGTACA ATGCCTCATT TCTTTTTTCA CTTAATCGTA 
2051 TATCTTGGAG AATGTTTTAT ATCCACACAT AAAGACCAGC CTGATTATTT 
2101 GTATAGCCAC ATAGTATTCC ATTATATGAA TATACTATCA TTTTTTAAAA 
2151 ACGGTATATT AATGAACATT TAGAGTATTT CAAAACTTTT GAAGCAATAC 
2201 TTTTAAGATG ATAATATAGA GACATTAGAT TTGGACTTGT AGGTGCTATC 
2251 ATTATTACTG TTTCTTTTTA ATTTATTATA TTATTAGGTA TTAATAAGAA 
2301 CAGACATTTG TATTCTGCTT TACAGCTTGA GATCACTGTA GCTTGTGGCA 
2351 TGTGATCCTC AAAACACCAG TCAGAAAGGT GTTATTCTTA TCCCTATTAG 
2401 ACAAATTAGG GAATTCAGGG TTAGAGAGGT GAGGAAAAGC ATTGTCCAAG 
2451 ATT AC AC ATT ACACAGCTAG CACACTGAGG AGCTGGCCCT GCCACTGTGG 
2501 ACTGCCCAGC TCCACCACCC TAGCTCAGTG GGGAAGGATG GATAACCTCC 
2551 TTCCATTTAC CCCCTGCCTT TCTGCACTGT CATTTTTTTG TGCCTTTCCT 
2601 TTCTCAGATC CTCTTATTCT AATTTACATC TTCCCACTTT TTCTAATTTG 
2651 ATAAAGTTGT AGACATGTTT CACTACATTC TTCCTCCCAC TGCCAGGTAC 
2701 CAGACACAGG GTAATGAAAT GTCACACCCA CCACTAATTT GAGAATTGCT 
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27 51 TAT.TTGCGCT TGAAACATCA AGAAAGCTCT ACCGACAGAC ATGTTTCATT 
2801 CACTTATGAT GAACCAACTG CCCATCTTTA CTGAATCTTC TTGACTGTAT 
2851 TTATTAAAGT TGCAATTTGG AAATAAAAAA AAAAAAAAAA AAAAAAAGG 



BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 269 bp to 1504 bp; peptide length: 412 
Category: putative protein 
Classification: no clue 



1 MATNIPCEW AFSDRKLKPL IYVYSFPGLT RRTKLKGNIL LDYTLLSFSY 
51 CGTYLASYSS LPEFELALWN WESSIILCKK SQPGMDVNQM SFNPMNWRQL 
101 CLSSPSTVSV WTIERSNQEH CFRARSVKLP LEDGSFFNET DWFPQSLPK 
151 DLIYGPVLPL SAIAGLVGKE AETFRPKDDL YPLLHPTMHC WTPTSDLYIG 
201 CEEGHLLMIN GDTLQVTVLN KIEEESPLED RRNFISPVTL VYQKEGVLAS 
251 GIDGFVYSFI IKDRSYMIED FLEIERPVEH MTFSPNYTVL LIQTDKGSVY 
301 IYTFGKEPTL NKVLDACDGK FQAIDFITPG TQYFMTLTYS GEICVWWLED 
351 CACVSKIYLN TLATVLACCP SSLSAAVGTE DGSVYFISVY DKESPQVVHK 
401 AFLSESSVQH W 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8p7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_8p7, frame 2 

Report for DKFZphtes3_8p7 . 2 

[ LENGTH) 412 

[MW] 46476.62 

tpl] 4.91 

[KW] Alpha_Beta 

SEQ MATNIPCEWAFSDRKLKPLIYVYSFPGLTRRTKLKGNILLDYTLLSFSYCGTYLASYSS 
PRD cccccceeeeeecccccceeeeeecccccccccccchhhhhhhheeeecccccccccccc 

SEQ LPEFELALWNWES SIX LCKKSQPGMDVNQMS FNPMNWRQLCLS S PST VS VWT I ERSNQEH 

PRD cchhhhhhhhccccceeeccccccccceeeccccccceeeeeccccceeeeeeeecchhh 

SEQ CFRARSVKLPLEDGSFFNETDWFPQSLPKDLIYGPVLPLSAIAGLVGKEAETFRPKDDL 
PRD hhhhhhhcccccccccccccccccccccccccccccccceeeeeeccccccccccccccc 

SEQ YPLLHPTMHCWTPTSDLYIGCEEGHLLMINGDTLQVTVLNKIEEESPLEDRRNFISPVTL 
PRD cccccccccccccccceeeecccceeeecccceeeeeehhhhhcccccccccccccccee 

SEQ VYQKEGVLASGIDGFVYSFIIKDRSYMIEDFLEIERPVEHMTFSPNYTVLLIQTDKGSVY 
PRD eeeceeeeecccceeeeeeeeeccchhhhhhhhhhcccceeeccccceeeeeecccccee 

SEQ IYTFGKEPTLNKVLDACDGKFQAIDFITPGTQYFMTLTYSGEICVWWLEDCACVSKIYLN 
PRD eeeccccccchhhhhcccccceeeeeccccceeeeeeeccceeeeeeecceeeeeeeehh 

SEQ TLATVLACCPSSLSAAVGTEDGS VYFI SVYDKESPQWHKAFLSESS VQHVV 
PRD hhhhhhhccccccceeeeccccceeeeeeeccccccchhhhhhhcccccccc 

(No Prosite data available for DKFZphtes3_8p7 .2 ) 
(No Pfam data available for DKFZphtes3_8p7 . 2) 
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DKFZphtes3_9e22 



group: testes derived 

DKFZphtes3 9e22 encodes a novel 227 amino acid protein with weak partial similarity to Ring- 
finger proteins. 

For the novel protein, Pfam, but not Prosite predicts a C3HC4 type RING finger motif e. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



similarity to zinc finger proteins 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1318 bp 

Poly A stretch at pos. 1308, no polyadenylation signal found 



1 GCTCCCCCGG CTTTCGGAGC CCGGGGGCGG CCTGTGGCGC GCGGAGCCCG 
51 CGCCGGACTG CGCCTCTTTG GACCTTGAGG GGAAACATGC GTTTGCCTTG 
101 GATCGTTTGA AATTCTAAGT TTGGGATCCC CGCCCGCCCG CCTGCCTCTT 
151 CCGCCCCGCG GGTTTTTTCC TTTTTTCCTT TTGCTTTTTT TCCTTTTCTC 
201 CCTCCGGGTC TCCTTTTTGA CTCCCTCCCC CTTTATGCTC GCCCAGCCCT 
251 CCCCCTGCTG CTGAGAAGTG GGGGAGGGTC TCGGCCTCCA GGTTCCCGCC 
301 CCACCGGGGC CCGGGCGAGC ATGGGGGGCA AGCAGAGCAC GGCGGCCCGC 
351 TCCCGGGGCC CCTTCCCGGG GGTCTCCACC GATGACAGCG CCGTGCCGCC 
401 GCCGGGAGGG GCGCCCCATT TCGGGCACTA CCGGACGGGC GGCGGGGCCA 
451 TGGGGCTGCG CAGCCGCTCG GTCAGCTCGG TGGCAGGCAT GGGCATGGAC 
501 CCCAGCACGG CCGGGGGGGT GCCCTTTGGC CTCTACACCC CCGCCTCCCG 
551 GGGCACCGGC GACTCCGAGA GGGCGCCCGG CGGCGGAGGG TCTGCGTCCG 
601 ACTCCACCTA TGCCCATGGC AATGGTTACC AGGAGACGGG CGGCGGTCAC 
651 CATAGAGACG GGATGCTGTA CCTGGGCTCC CGAGCCTCGC TGGCGGATGC 
701 TCTACCTCTG CACATCGCAC CCAGGTGGTT CAGCTCGCAT AGTGGTTTCA 
751 AGTGCCCCAT TTGCTCCAAG TCTGTGGCTT CTGACGAGAT GGAAATGCAC 
801 TTTATAATGT GTTTGAGCAA ACCTCGCCTC TCCTACAACG ATGATGTGCT 
851 GACTAAAGAC GCGGGTGAGT GTGTGATCTG CCTGGAGGAG CTGCTGCAGG 
901 GGGACACGAT AGCCAGGCTG CCCTGCCTGT GCATCTATCA CAAAAGCTGC 
951 ATAGACTCGT GGTTTGAAGT GAACAGATCT TGTCCGGAAC ACCCTGCGGA 
1001 CTGACCTGCG GGCTTGCTTG CTGACTCCTC TCAAAGGGAC AGAGCGCCCC 
1051 TGCTCCAGGG AGGAGGCTCA CCGGACCCTG GGGCAGAGCT GAGCTTGGGA 
1101 CACCAGCGGG AACAGGGCAC CCCTTCTGCA CTGACTTCCA GATCATGGTT 
1151 CTCCCTTCCT CCCTGAGGAC ACCAAATTGG ATGAGAGCAA GTTTGAGAGA 
1201 AGAATGAATC AACTGCTATC CTTCCCCTCA CCCCTCAGCC CAGGAGGGAA 
1251 AGGGCATTTT CTTTTTCATC TTTGAAAGGC ATTGTGGGTC TGTCTTTAAA 
1301 GTGTTTACAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 321 bp to 1001 bp; peptide length: 227 
Category: similarity to known protein 
Classification: unclassified 



1 MGGKQSTAAR SRGPFPGVST DDSAVPPPGG APHFGHYRTG GGAMGLRSRS 

51 VSSVAGMGMD PSTAGGVPFG LYTPASRGTG DSERAPGGGG SASDSTYAHG 

101 NGYQETGGGH HRDGMLYLGS RASLADALPL HIAPRWFSSH SGFKCPICSS 

151 SVASDEMEMH FIMCLSKPRL SYNDDVLTKD AGECVICLEE LLQGDTIARL 
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201 PCLCIYHKSC IDSWFEVNRS CPEHPAD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9e22, frame 3 

TREMBL:AF07 8823_1 product: "RING-H2 finger protein RHA2b w ; Arabidopsis 
thaliana RING-H2 finger protein RHA2b mRNA, complete cds . , N • 1, Score 
* 111, P = 2.8e-06 

TREMBL:AF078822_1 product: "RING-H2 finger protein RHA2a"; Arabidopsis 
thaliana RING-H2 finger protein RHA2a mRNA, complete cds., N - 1, Score 
=» 112, P = 6.6e-06 

TREMBL:AC004138 14 gene: "T17M13 . 17" ; Arabidopsis thaliana chromosome 
II BAC T17M13 genomic sequence, complete sequence., N » 2, Score « 123, 
P =» 1.4e-05 

PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana, N - 1, 
Score = 142, P « 8.8e-08 



>PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 
Length =■ 327 

HSPs: 

Score - 142 (21.3 bits), Expect - 8.8e-08, P = 8.8e-08 
Identities - 24/57 (42%), Positives - 30/57 (52%) 

Query: 166 SKPRLSYNDDVLTKDAGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCP 222 

S P + LT D +C +C+EE + G LPC IYHK CI W +N SCP 

Sbjct: 206 SLPSVKITPQHLTNDMSQCTVCMEEFIVGGDATELPCKHIYHKDCIVPWLRLNNSCP 262 



Pedant information for DKFZphtes3_9e22 , frame 3 



Report for DKFZphtes3_9e22 . 3 



(LENGTH] 227 

[MU] 23782.62 

[pi] 6.18 

[HOMOL] PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 2e-08 

( FUNCAT ] 99 unclassified proteins (S. cerevisiae, YDR313c] 4e-06 

[FUNCAT] 30.07 organization of endoplasmatic reticulum (S. cerevisiae, YOL013c) 

0.001 

( FUNCAT] 06.13 proteolysis IS. cerevisiae, YOL013c] 0.001 

IPFAM] Zinc finger, C3HC4 type (RING finger) 

[KW] Irregular 



SEQ MGGKQSTAARSRGPFPGVSTDDSAVPPPGGAPHFGHYRTGGGAMGLRSRSVSSVAGMGMD 

PRO cccccccccccccccccccccccccccccccccccccccccccccccccceeeccccccc 

SEQ PSTAGGVPFGLYTPASRGTGDSERAPGGGGSASDSTYAHGNGYQETGGGHHRDGMLYLGS 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeech 

SEQ RASLADALPLHIAPRWFSSHSGFKCPICSKSVASDEMEMHFIMCLSKPRLSYNDDVLTKD 

PRD hhhhhhhhceeecccccccccccccccccccchhhhhhhhhhhhcccccccccccccccc 

SEQ AGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCPEHPAD 

PRD cceeeeeecccccccccccccceeeeeeccchhhhhhhhcccccccc 



(No Prosite data available for DKFZphtes3_9e22 . 3) 



Pfam for DKFZphtes3_9e22 . 3 



HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CPmC* 

C IC L+++ D++ LPC+ ++ ++CI +W CP+ 

Q uer y 184 CVIC LEELLQGDT I ARLPC LC I YHKSC I DSW FEVNRSC PEH 224 
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DKFZphtes3_9i20 



group: testes derived 

DKFZphtes3_9i20 encodes a novel 205 amino acid protein with similarity to human KIAA0336 gene. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 

Locus: /map-"44.1 cR from top of Chrl7 linkage group" 
Insert length: 2509 bp 

Poly A stretch at pos. 2499, polyadenylation signal at pos. 2481 



1 CTCGCCGAGA TGACCTGGGC ACCTCTGCGT TGAATCGGCA AATACTGATC 
51 AAGCCGCATT TATTCTGCTC TCAGGAACTC TAAGTCTAGC AGAGAAGATG 
101 AGGCGGTAGA AGTTCATCAA TGGCTTGGCT GGAGGACAAG CAAATTGAGG 
151 ACATTGGCAA CGGAGTGATC AAAATGATAG ATCATGAGGC CTAAAATGAA 
201 TAAGGAAAGA AGAGAAGTGG CAGAGGCTGA GAACAGAAAG AGAGGGTGGA 
251 GGGGCTGTAA ATCTTGAAGA TTAGGGTATA ATATGAGTAT ATGGGTAAGA 
301 ATTGGAAGAA TTGTGTAGGA GGCAGTAGTC AAAAAGTAGA AGCAGTTTGG 
351 AAGAGTAGTT ACAAATATCA AGAGCCAGGT GGCTAAAAGG TGGAGCTATA 
401 GGTCATTGAA GCTCAAGAAA CTGAGTCTCT AGGGCATTGG TTAAGTCATC 
451 TGTCTAGACT TCAAAGTTGT CTAGGATGAT AATTCAGAAG ACTGATCTGT 
501 GCCAAAGTCA CAGGTTTTTC ACGACTGAAA ACAACATAGC AAAATAAGCC 
551 AAGATGTCTG TGGATCCAAT GACCTACGAG GCCCAGTTCT TTGGCTTCAC 
601 GCCACAAACG TGCATGCTTC GGATCTACAT TGCATTTCAA GACTACCTAT 
651 TTGAAGTGAT GCAGGCCGTT GAACAGGTTA TTCTGAAGAA GCTGGATGGC 
701 ATCCCAGACT GTGACATTAG CCCAGTGCAG ATTCGCAAAT GCACAGAGAA 
751 GTTTCTTTGC TTCATGAAAG GACATTTTGA TAACCTTTTT AGCAAAATGG 
801 AGCAACTGTT TTTGCAGCTG ATTTTACGTA TTCCCTCAAA CATCTTGCTT 
851 CCTGAAGATA AATGTAAGGA GACACCTTAT AGTGAGGAAG ATTTTCAGCA 
901 TCTCCAGAAA GAAATTGAAC AGTTACAGGA GAAGTACAAG ACTGAATTAT 
951 GTACTAAGCA GGCCCTTCTT GCAGAATTAG AAGAGCAAAA AATTGTTCAG 
1001 GCCAAACTCA AACAGACGTT GACTTTCTTT GATGAGCTTC ATAATGTTGG 
1051 C AG AG AT CAT GGGACTAGTG ATTTTAGGGA GAGTTTAGTA TCCCTGGTTC 
1101 AGAACTCCAG AAAACTACAG AACATTAGAG ACAATGTGGA AAAGGAATCG 
1151 AAACGACTGA AAATATCTTA ATTGCTCAGT AGTCAAAAGG AGGAGCCTGT 
1201 CAAAAAGTAG AATCATAAGG ACTGTTCAAA CCATAAGGAC TGTTCAAATC 
1251 ATACCAGTGA CTGTTCAAAC CAACCATACT TTTTATTAGA TTTGCTTTGT 
1301 CAACTCTTTC TTGTATTCTG TGTTTTCCTC TTTTTTGGTC CACTTTGCTG 
1351 AGGTATGAAG TGTACTACTT TGAACTAGGC TGAAGCATCT GAGTCTTCTA 
1401 ATAAGTGGGA AGGGATCCAA CAAAGAAGCC ATGACCAGTT AAAGATATTT 
1451 GCAGAGTTAC ACCTTGGTCA TAAGTCCTTT GTGACCTTGA TTATTTTGGC 
1501 TTACTCTTTG GATGAGACCA GACAAGAAAA GGATTAAACG GGTGGCTCCT 
1551 TTAATATTAT TATTATTGTT TTTGAGACAA GGTCCCTTTC TGTCACCCAG 
1601 GTTAGAGTAG ATTTCAGTGG CACAATCTTG GCTCACTGCA ACCTCTGTGT 
1651 CCTGGGCTCA AGTGATCCTC CTGCCTCAGC CTCCCAAGTA GCTAGGACCA 
1701 CAGGTGCGTG TCACCATGCT TGGCTAATTT TTTTGCAGAA ACGAGGCCTC 
1751 ACTATATTGT CCAGGCTGAG TGGCTCTTTT ATTAACCAGT CATTACACTG 
1801 CGGAACAGCC AACATAGAGT ACTTGCTCTC GTCCTGTGAA TTTTCTTTCA 
1851 TGAGGGAGTC AATATGTAGT GGAAAGAAGC ATGTAGCAAA AAAGACAACC 
1901 TTGATCTTTA ATAAAAAAGA AGTTGGTTTA TTTCCAAAAT AAATCCCCTG 
1951 ACAAAAAACC TGGTGATGTT AAGCAATTGA CTGTCTTAGA GTCCAGCAGA 
2001 AGACCTTAGA CAAAAAAAGC AGAACCCACT GGAGTAGAAA AGGAAGCATG 
2051 TAGCATATAC TCAGTAGTGA AATTTAATTT TACTGACTGT TAGGTATCTA 
2101 TGCCAATTTG TTTTCATACT TCAGTTGGTT TTGGAATCTG CCTTATACCT 
2151 AATATTTATT TATTCACACT CATAAGCATC AAATATTTAA TGCCCTCAGT 
2201 GGGAAATTTG TGTTTAAACT CAATGGAATC TAATATTTCT TTATGTCGTT 
2251 AGTCCCTGTA AAATGTTAGG TCACCCAAGG AAAGGGGAGA AATAGCAATG 
2301 GTTGTTCCTA AGGTATTGCT TGCCCTCCAT GTCTTCCTAA AGAGCAGAAC 
2351 TTGGAGTTTC TCCTTTATGT AGAGAAGAAG TAACTTAGGG TGTATTTGCA 
2401 ATGAAATATT CATAGATATT GAAAGCTTGT GTTTACATGA AATATGTTTA 
2451 TTATCAAGAA GTCCTTTTTC CAATTCTGTA CATTAAATAT ATGTGTTTTA 
2501 AAAAAAAAA 



BLAST Results 
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Entry AC004148 from database EMBL: 

Homo sapiens chromosome 17, clone HCIT524C5, complete sequence. 
Score - 5245, P - 0.0e+00, identities « 1049/1049 
3 exons 



Entry HS556361 from database EMBL: 
human STS TIGR-A003N29 . 

Score = 1005, P « 1.3e-39, identities = 201/201 

Entry HSG04 3 from database EMBL: 
human STS SHGC-36031. 
Score - 955, P - 2.8e-37, identities - 205/215 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORE* from 554 bp to 1168 bp; peptide length: 205 
Category: putative protein 
Classification: no clue 



1 MSVDPMTYEA QFFGFTPQTC MLRIYIAFQD YLFEVMQAVE QVILKKLDGI 

51 PDCDISPVQI RKCTEKFLCF MKGHFDNLFS KMEQLFLQLI LRIPSNILLP 

101 EDKCKETPYS EEDFQHLQKE IEQLQEKYKT ELCTKQALLA ELEEQKIVQA 

151 KLKQTLTFFD ELHNVGRDHG TSDFRESLVS LVQNSRKLQN IRDNVEKESK 
201 RLKIS 



BLAST P hits 



No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_9i20, frame 2 

TREMBLNEW : HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 

complete cds., N =~1, Score « 107, P ■ 0.0081 



>TREMBLNEW:HSAB2334_1 gene: n KIAA0336"; Human mRNA for KIAA0336 gene, 
complete cds. 

Length - 1,583 



HSPs: 



Score = 107 (16.1 bits), Expect - 8.2e-03, P = 8.1e-03 
Identities - 42/140 (30%), Positives = 76/140 (54%) 



Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 



65 EKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEED FQHLQKE 120 

EK CF+K H +NL +EQ +L R ILL +D ++P + D + L+++ 

796 EKEKCFIKEH-ENLKPLLEQK— ELRDRRAELILL-KDSLAKSPSVKNDPLSSVKELEEK 851 

121 IEQLQE— KYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESL 178 

IE L++ K K E K L+A ++ +K + + K+T T +EL ++ + S+ 
852 IENLEKECKEKEEKINKIKLVA-VKAKKELDSSRKETQTVKEELESLRSEK—DQLSASM 908 

179 VSLVQNSRKLQNIRDNVEKESKRLKI 204 

L+Q + +N+ EK+S++L + 
909 RDLIQGAESYKNLLLEYEKQSEQLDV 934 



Pedant information for DKFZphtes3_9i20, frame 2 



Report for DKFZphtes3_9i20. 2 



(LENGTH] 205 

[MW] 24140.13 

tpl) 5.51 

[KW1 All_Alpha 

(KW) COILED_COIL 18.05 % 



992 



WO 01/12659 



PCT/IB00/01496 



SEQ MSVDPMTYEAQFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVILKKLDGIPDCDISPVQI 

PRO cccccchhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

COILS 

SEQ RKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEEDFQHLQKE 

PRD cccchhhhhhhcccccchhhhhhhhhhhhhhhcccceeeccccccccccchhhhhhhhhh 

C0ILS cccccccccc 

SEQ IEQLQEKYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESLVS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhh 

COILS ccccccccccccccccccccccccccc 

SEQ LVQNSRKLQNI RDNVEKES KRLKI S 

PRD hhcccchhhhhhhhhhhhhhhcccc 

COILS 

(No Prosite data available for DKFZphtes3_9i20 . 2 ) 
(No Pfara data available for DKFZphtes3_9i20 . 2 ) 
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PCT/IB00/01496 



group: testes derived 

DKFZphtes3_9k22 encodes a novel 304 amino acid protein with partial similarity to X. leavis 
katanin p80. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



similarity to C-terminus of katanin p80 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2676 bp 

Poly A stretch at pos. 2665, no polyadenylation signal found 



1 CTCTCTAGGC TGCCGGGCGC TGGTCGTCAG CGCCGAGGCT GGGCTGAGGC 
51 GCCGCGGTAC CATGAGGCGC CGGTACTTAA GAGATTATGG CATCAGAAAC 
101 CCACAATGTT AAAAAACGGA ACTTTTGTAA TAAGATTGAG GATCATTTCA 
151 TTGATCTTCC TAGAAAAAAG ATCTCTAATT TCACTAATAA GAACATGAAG 
201 GAGGTTAAGA AATCTCCAAA ACAGTTGGCT GCTTACATAA ATAGAACAGT 
2 51 TGGACAAACT GTGAAAAGCC CAGATAAACT TCGTAAAGTG ATCTATCGCA 
301 GAAAGAAAGT TCATCATCCC TTTCCAAATC CTTGTTACAG AAAAAAACAG 
351 TCCCCTGGAA GTGGGGGCTG TGACATGGCA AATAAAGAAA ATGAACTGGC 
401 TTGTGCAGGC CACCTGCCTG AAAAATTACA CCATGATAGT CGAACATATT 
451 TGGTTAACTC CAGTGATTCT GGTTCTTCAC AGACAGAAAG CCCATCATCA 
501 AAATATAGTG GGTTTTTTTC TGAGGTTTCT CAGGACCATG AAACAATGGC 
551 CCAAGTTTTG TTCAGCAGGA AT AT GAG ATT GAATGTAGCT TTAACTTTCT 
601 GGAGAAAGAG AAGTATAAGT GAACTTGTAG CTTATTTGTT GAGGATAGAA 
651 GATCTTGGCG TTGTGGTAGA TTGCCTTCCT GTGCTCACCA ATTGTTTACA 
701 GGAAGAAAAA CAATATATCT CACTTGGCTG CTGTGTTGAC TTGTTGCCTC 
751 TAGTAAAGTC ACTACTTAAA AGCAAATTTG AAGAATATGT TATAGTTGGT 
801 TTAAACTGGC TTCAAGCAGT CATTAAAAGG TGGTGGTCAG AACTATCATC 
851 CAAAACAGAA ATTATAAATG ATGGAAATAT TCAAATTTTA AAACAACAAT 
901 TAAGTGGATT ATGGGAACAG GAAAACCATC TTACTTTGGT TCCAGGATAT 
951 ACTGGTAATA TAGCTAAGGA TGTAGATGCT TATTTATTAC AGTTACATTG 
1001 AGAGATTTCA TCTACTAAAG AGCATTTGGT TTTTCAAAAC ATCCCTGAAC 
1051 TGTATAATTT ACAAAAAAAA AAGTCTCGTC TGAGAACTGT GAACTGTGGA 
1101 AGAAATCAAA ACTATTTTTT CTTTTAAAAA GCCACGTAAT GAAACCACTA 
1151 ATGAAATCCC AGCAATCTGC TTCACATTGA AGTGGAAAAA TATCCAAAAG 
1201 GAGCAGCTTC AATTTCATTG AGGTGAAAGT GCACTATGAA GATTGTTCAC 
1251 CTTTGCTGCA TTTGGGAGTT ATATGGTTAT TTGGTAACAT TAAGAACTAC 
1301 TGGATTTTAA TGCAATCCTG CATAAAAATA TAATTTATAC TATGTGAAAA 
1351 AATAAGACAG GACTTACCAC TAGGAACCAC CAAGACCAAT CATCATTAAC 
1401 TTTTTTAAGA TTGTGTTTTA TTAAAAAAAA AAAACACTTA AATGTGTGCA 
1451 GCTATTTTCT TATGTTGAAA AGACTGAAAG TTTAAAACAT GAAAAAAATC 
1501 AATATTAAAC ATTTTTTGTT CACACTGAGA TACTGTGTAT GTAAAATGCC 
1551 TTAATTATTA ATAAGCCAAT GTGTTATGAT ACCAATATCT GTTTTAAAAA 
1601 ACTAAAACCA ACCATGCTTC TGGCATGATA AAATCATGGA ATTAAATCAG 
1651 GGGTTTACAT TCTTGTAGAG TGTTCTTGAA ACACTCTCTG CACCATTTTT 
1701 AAAACTTGAG AATAGTTTTA GTATCTCTGA TATTTTTTGC CAGAATCATC 
1751 ATGTCATGTA TGAATGTGTT ATCCCTATCT AAGGAAAAAG GTGAATATGT 
1801 TTTTGTATGA ATGTTTAACT GGAAATGTCC ATGGACTTGG CTAATTTATA 
1851 TTTACTTTTT ATTGTACATA GATTTCTAAT ATTTTTCATT CCTGTATCAT 
1901 TTAAACTTCC TTCATTTGAG TAAATTCACT AAATATTTCT ATTTTTTTGC 
1951 TTTTTTAAAT TCTGATTTTA TATGAATTCT AATTCTTTTT CACTACATAT 
2001 GTTTTAAAGA GTTACATACA GTGATTTAGA ATGGTTTACA GTTAATGCTG 
2051 ATCTTGTATT TTAAATTCCA ACACTTTGTG TCACTACCTC CTCTAATGGT 
2101 TAGTATGATA TGCTAGCAGA CTGTATGAGG TCTTTTTTTA AAATACCACT 
2151 TTTAGTGTCA GTGAACCAAA TTCTGGAATG TCTTAACAGC TCTAAATCTT 
2201 ACTTGTCTTG AAAATGATTG GGGTTTAATA CCACTGCTGG TGGTTCACAC 
2251 ATCATCCCAT CCTTAATATG CCTGACAGGC ATCTGAGCAA AGGTTTTTAG 
2301 TAATTGAATT TCTCTGCAGT AGTCCTTCAA GCACTTGAAT GTAAACCTTT 
2351 AGCATTTATT CGTTTAATGA CTACTGATAC GAATCTCAAG CAGATTTCTT 
2401 GCTCTTAAAA GTTATGTTTC ACTGAGTTCT GGTTTTGTGT AGCTATATTT 
2451 TATATAGCTA GATATTCCTC ACAGTGAACA TGAATTGTAA TAATTGGTTA 
2501 TTTCCTTAAG TCTTTAGATT ATAATAATTT CAGATTATTG CACGTCTGTG 
2551 ATTTGAGAGG TGAGTTATTT AAGAGGCCAG TTTTCAGGAC ATGGGAATTT 
2 601 GAATTGTAAA CCTGTTATCT CTGTGAAACT TTTAACATGA TAAAATATAA 
2651 CCTTTCTTTG TGCTTAAAAA AAAAAA 
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BLAST Results 



Entry HS541354 from database EMBL: 
human STS WI-11840. 
Score = 1267, P = 7.1e-50, identities » 271/281 



Medline entries 



98227670: 

Katanin, a microtubule-severing protein, is a novel AAA ATPase 
that targets to the centrosome using a WD40-containing subunit. 



Peptide information for frame 3 



ORF from 87 bp to 998 bp; peptide length: 304 
Category: similarity to known protein 
Classification: unclassified 



1 MASETHNVKK RNFCNKIEDH FIDLPRKKIS NFTNKNMKEV KKSPKQLAAY 
51 INRTVGQTVK SPDKLRKVIY RRKKVHHPFP NPCYRKKQSP GSGGCDMANK 
101 ENELACAGHL PEKLHHDSRT YLVNSSDSGS SQTESPSSKY SGFFSEVSQD 
151 HETMAQVLFS RNMRLNVALT FWRKRSISEL VAYLLRIEDL GVVVDCLPVL 
201 TNCLQEEKQY ISLGCCVDLL PLVKSLLKSK FEEYVIVGLN WLQAVIKRWW 
251 SELSSKTEII NDGNIQILKQ QLSGLWEQEN HLTLVPGYTG NIAKDVDAYL 
301 LQLH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9k22, frame 3 

TREMBL : AF0 5 602 1 1 product: "p8Q katanin"; Xenopus laevis p80 katanin 

mRNA, partial cds., N - 1, Score « 146, P « 1.2e-07 

TREMBL:AF052432_1 product: "katanin p80 subunit"; Homo sapiens katanin 

p80 subunit mRNA, complete cds.. N - 1, Score - 150, P - 1.2e-07 

TREMBL:AF052433_1 product: "katanin p80 subunit"; Strongylocentrotus 
purpuratus katanin p80 subunit mRNA, complete cds., N = 2, Score = 146; 
P - 4.2e-07 



>TREMBL:AF052432_1 product: "katanin p80 subunit"; Homo sapiens katanin p80 
subunit mRNA, complete cds. 
Length «=» 655 



HSPs: 



Score - 150 (22.5 bits), Expect * 1.2e-07, P - 1.2e-07 
Identities * 35/105 (33%), Positives « 55/105 (52%) 

Query 145 SEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISELVAYLLRIEDLGWVDCLPVLTNCL 204 

S + + + h+TM VL SR+ L+ W I V + I DL WVD L N + 
SbjCt: 489 SQIRKGHDTMCWLTSRHKNLDTVRAVWTMGDIKTSVDSAVAINDLSWVDLL NIV 544 

Query: 205 QEEKQY I SLGCCVDLLPLVKSLLKSKFEE YVI VGLNWLQAV I KRW 249 

++ L C +LP ++ LL+SK+E YV G L+ +++R+ 

SbjCt" 545 NQKASLWKLDLCTTVLPQIEKLLQSKYESYVQTGCTSLKLILQRF 589 



Pedant information for DKFZphtes3_9k22, frame 3 



Report for DKFZphtes3_9k22 . 3 



[LENGTH] 304 

[MW] 34767.24 

[pi] 9. IB 

(KW] AilJVlpha 



995 



WO 01/12659 



PCT/IB00/01496 



(KW) LOW_COMPLEXITY 3.95 % 

SEQ h4ASETHNVKKRNFCNKIEDHFIDLPRKKISNFTNKNMKEVKKSPKQLAAYINRTVGQTVK 

SEG 

PRO ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccccc 

SEQ SPDKLRKVIYRRKKVHHPFPNPCYRKKQSPGSGGCDMANKENELACAGHLPEKLHHDSRT 

SEG 

PRD ccchhhhhhhhhhhcccccccccccccccccccccccccchhhhhhccccccccccccce 

SEQ YLVNSSDSGSSQTESPSSKYSGFFSEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISEL 

SEG 

PRD eeecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VAYLLRIEDLGVVVDCLPVLTNCLQEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhcceeeeeeccchhhhhhhhceeeccceeeehhhhhhhhhhhheeeeeeehh 

SEQ WLQAVIKRWWSELSSKTEIINDGNIQILKQQLSGLWEQENHLTLVPGYTGNIAKDVDAYL 

SEG 

PRD hhhhhhhhhhhhcccceeeeccccccccccccchhhhhhhhhhccccccccchhhhhhhh 

SEQ LQLH 

SEG 

PRD hccc 

(No Prosite data available for DKFZphtes3_9k22 . 3) 
(No Pfara data available for DKFZphtes3_9k22 . 3) 
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. Prosite Key 

NAME: N-glycosylation site. 
CONSENSUS: N-{P}IST|-{P}. 

NAME: Glycosaminoglycan attachment site. 
CONSENSUS: S-G-x-G. 

NAME: Tyrosine sulfation site. 

NAME: cAMP- and cGMP-dependent protein kinase phosphorylation site. 
CONSENSUS: [RK](2)-x-[ST] . 

NAME: Protein kinase C phosphorylation site. 
CONSENSUS: (ST)-x-[RK]. 

NAME: Casein kinase II phosphorylation site. 
CONSENSUS: (ST]-x(2)-[DE) . 

NAME: Tyrosine kinase phosphorylation site. 
CONSENSUS: [RK]-x(2,3)-[DE]-x(2.3)-Y. 

NAME: N-myristoylation site. 

CONSENSUS: G-{EDRKHPFYW}-a(2>-[STAGCNMP}. 

NAME: Amidation site. 
CONSENSUS: a-G-IRK]-[RKJ. 

NAME: Aspanic acid and asparagine hydroxylation site. 
CONSENSUS: C-x-[DN]-x(4)-[FY].x-C-x-C. 

NAME: Vitamin K-dependent carboxylation domain. 

CONSENSUS: x(12)-E-x(3)-E.x-C.x(6)-[DEN]-x-[LIVMFY]-x(9)-[FYWl. 

NAME: Phosphopantetheine attachment site. 

CONSENSUS: rDE(^STAli4KRHMLIVMrcSTACHGNQHL^ 

CONSENSUS: {PCFY}-[STAGCPQLIVMF]-[LJVMATN3-(DEN<^TAKRHLM]-tLrVMWSTA]*[LrVG 
CONSENSUS: x(2)-(LIVMFAJ. 

NAME: Acyl carrier protein phosphopantetheine domain profile. 

NAME: Prokaryouc membrane lipoprotein lipid attachment site. 

CONSENSUS: {DERK}(6)-[LIVMFWSTAG](2>[LIVMFYSTAGCQ1-1AGS]-C. 

NAME: Prokaryotic N-terminal methylation site. 

CONSENSUS: [KRHEQSTAG]^-[r^IVM3.[ST]-[LT][UVP]-E-[LIVMFW^ 

NAME: Prenyl group binding site (CAAX box). 
CONSENSUS: C-{DENQ}-[LIVM]-x > . 

NAME: Protein splicing signature. 

CONSENSUS: pNEG]-x-[UVFA]-[LIVMYl-a*VAST].H.N-[STq. 

NAME: Endoplasmic reticulum targeting sequence. 
CONSENSUS: [KRHQSA]-[DENQ]-E-L > . 

NAME: Microbodies C -terminal targeting signal. 
CONSENSUS: [STAGCNHRKH]-[UVMAFY] > . 

NAME: Gram-positive cocci surface proteins •anchoring' hexapeptide. 
CONSENSUS: L-P-x-T-G-[STGAVDE]. 

NAME: Bipartite nuclear targeting sequence. 

NAME: Cell attachment sequence. 
CONSENSUS: R-G-D. 

NAME: ATP/GTP-binding site motif A (P-loop). 
CONSENSUS: [AG}-x(4)-G-K-(ST]. 

NAME: Cyclic nucleotide-binding domain signature 1 . 
CONSENSUS: [UVM]-[VIC}-x(2)^[DEN(^A]-xlGAC^^ 
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NAME: Cyclic nucleotkle-binding domain signature 2. 

CONSENSUS: [LIVMF]-G-E-x-[GASMUVM]-x(5. 1 iVR-[STAQ]-A-x-[LIVMA]-x-{STACV]. 
NAME: cAMP/cGMP binding motif. 
NAME: EF-hand calcium-binding domain. 

CONSENSUS: D-x.[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[UVMC]-[DENQSTAGCl-x(2)- 
CONSENSUS: [DEHUVMFYW]. 

NAME: Actinin-type actin-binding domain signature 1. 
CONSENSUS: [EQ]-x(2MATV]-[FY]-x(2)-W-x-N. 

NAME: Actinin-typc actin-binding domain signature 2. 

CONSENSUS: [LIVM].x-[SGN]-[LIVMl.(DAGHEHSAG]-x.lDNEAG}-|LIVM]-x.[DEAG}.x(4)- 
CONSENSUS: [LIVM]-x-[LM].[SAG]-(LIVM)-{LIVMT]-W-x-[LIVM](2). 

NAME: Anaphylatoxin domain signature. 

CONSENSUS: (CSH]-C-x(2)-[GAP].x(7.8)-[GASTDEQR]-C-[GASTDEQLl-x(3,9HGASTDEQN3-x(2)- 
CONSENSUS: [CEJ-x(6,7)-C-C. 

NAME: Anaphylatoxin domain profile. 

NAME: Apple domain. 

CONSENSUS: C-x(3)-[LIVMnn-x(5>[LIVMFYl-x(3HDENQ]-[LIVMFY]-x(10K-x(3K:-T- 
CONSENSUS: x(4Kx-(LIVMFYl-F-x-[Fir]-x(13 t 14)-C-x-lLlVMFY|-lRKl-x-[ST]-x(14,15)- 
CONSENSUS: S-G-x-[ST]-[LIVMFY]-x(2)-C 

NAME: Band 4. 1 family domain signature 1. 

CONSENSUS: W-[UV]-x(3)-tKRQ]-x-[lJVM]-x(2)-[QH]-x(0,2).[LlVMF]-x(6 f 8)-[LIVMI : ]- 
CONSENSUS: x(3.5>-F-[FY]-x(2)-[DENS]. 

NAME: Band 4. 1 family domain signature 2. 

CONSENSUS: [HYW]-x(9)-lDENQSTV]-[SA]-x(3)-[FY]-[LIVM]-x(2)-[ACVJ.x(2HLM]-x(2)- 
CONSENSUS: (FYl-G-x-[DENQSTMUVMFYS]. 

NAME: Band 4. 1 family domain profile. 

NAME: Clq domain signature. 

CONSENSUS: F-x(5)-[ND].x(4)-[FYWL].x(6)-F-x(5)-G-x-Y.x-F-x-CFY]. 

NAME: C-terminal cystine knot signature. 

CONSENSUS: C-C-x(13)-C-x(2MGN]-x(12)-C-x-C-x(2,4)-C. 

NAME: C-terminal cystine knot profile. 

NAME: CUB domain profile. 

NAME: Death domain profile. 

NAME: EGF-like domain signature 1. 
CONSENSUS: C-x-C-x(5)-G-x(2)-C. 

NAME: EGF-like domain signature 2. 
CONSENSUS: C-x-C-x(2MGP]-[FYWl-x(4,8)-C. 

NAME: Calcium-binding EGF-like domain pattern signature. 

CONSENSUS: lDEQNl-x-[DEQN)(2)-C-x(3,14K-x(3.7)-C-x-IDN|-x(4)-[FYl-x-C. 

NAME: Laminin-type EGF-like (LE) domain signature. 

CONSENSUS: C-x(l,2K-x(5)-G-x(2)-C-x(2)-C-x(3,4HFYWJ-x(3,15)-C. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature I. * 

CONSENSUS: [GAS]-W-x(7, 1 5)-[FYWl-[LIV]-x-lLI VFA]-lGSTDENl-x(6)-[LIVFl-x(2)-[IV]-x- 

CONSENSUS: [LIVT]-[QKM1-G. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 2. 
CONSENSUS: P-x(8,10MLM]-R-x-(GE]-(LIVP]-x-G<:. 

NAME: Forkhcad-associated (FHA) domain profile. 

NAME: Fibrinogen beta and gamma chains C-terminal domain signature. 
CONSENSUS: W-W-{UVMFYW]-x<2)-C-x(2)-(GSA]-x(2>N-G. 

NAME: Type I fibronectin domain. 
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CONSENSUS: C-x(6,8)-[LFY]-x(5)-[FYWJ-x-lRK]-x(8 > lO)-C-x-C-x(6,9)-C. 
NAME: Type II fibronectin collagen-binding domain. 

CONSENSUS: C-x(2)-P-F.x-[FYWI]-x(7)-C-x(8,lO)-W-C-x(4HDNSR]-[FYW].x(3,5)-[FYW]-x. 
CONSENSUS: [FYWI]-C. 

NAME: Hemopexin domain signature. 

CONSENSUS: [UFAT]-x(3)-W-x(2.3)-[PE]-x(2)-tLIVMFYl-[DENQSl-{STA]-[AV]-[LIVMFYl. 

NAME: Kringle domain signature. 
CONSENSUS: [FY]-C-R-N-P-[DNR]. 

NAME: Kringle domain profile. 

NAME: LDL-receptor class A (LDLRA) domain signature. 

CONSENSUS: C-{VILMA]-x(5)-C-PNH]-x(3)-[DENQHT]-C-x(3,4).[STADE]-[DEHMDEl-x(I f 5> 
CONSENSUS: C. 

NAME: LDL-receptor class A (LDLRA) domain profile. 
NAME: C-rype lectin domain signature. 

CONSENSUS: C-[UVMFYATG]-x(5,12)-[WL]-x-[DNSR]-x(2)-C-x(5.6)-[FYWLIVSTA]-{LrVMSTA]- 
CONSENSUS: C. 

NAME: C-rype lectin domain profile. 

NAME: Link domain signature. 

CONSENSUS: C-x(l5)-A-x(3,4)-G-x(3)-C-x(2>-G-x(8,9)-P-x(7>-C. 
NAME: Osteonectin domain signature 1 . 

CONSENSUS: C-x.[DN]-x(2K-x(2)-G-[KRH]-x^:-x(6,7)-P-x-C-x-C-x(3,5)-C-P. 

NAME: Osteonectin domain signature 2. 
CONSENSUS: F-P-x-R-[IM]-x-D-W-L-x-[NQ]. 

NAME: Somatomedin B domain signature. 

CONSENSUS: C-x-C-x<3)-C-x(5)-C-C-x-[DNHFY]-x(3)-C. 

NAME: Thyroglobulin type-1 repeat signature. 

CONSENSUS: [FYWHP]-x-P-x-C-x(3,4)-G-x«[FYW]-x(3)-Q-C-x(4j0K-[FYWl-C-V-x(3 f 4>- 
CONSENSUS: [SG]. 

NAME: P-type 'Trefoil' domain signature. 

CONSENSUS: • R-x(2K-x-[FV'PST]-x(3,4)-[STl-x(3)-C-x(4)-C-C-[FrWHl. 
NAME: Cellulose-binding domain, bacterial type. 

CONSENSUS: W-N-[STAGR]-[STDN]-[LIVM]-x(2)-[GST]-x-[GST]-x<2)-[LIVMFTl-{GA]. 
NAME: Cellulose-binding domain, fungal type. 

CONSENSUS: C-G-G-x(4,7)-G-x(3)-C-x(5)-C.x(3,5)-[NHGJ-x-tFYWMl-x(2)-Q-C. 

NAME: Chitin recognition or binding domain signature. 
CONSENSUS: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(4)-|FYW]-C. 

NAME: Barwin domain signature I. 
CONSENSUS: C-G-[KR]-C-L-x«V-x-N. 

NAME: Barwin domain signature 2. 
CONSENSUS: V-[DN]-Y-[EQ]-F-V-[DN}-C. 

NAME: BIR repeat. 

CONSENSUS: [HKEPILVY]-x(2)-R-x(3 t 7HFYW]-x(l 1 , 14)-[STAN]-G-(LMF]-X-[FYHDA]-X(4)- 
CONSENSUS: [DESL]-X(2.3K-X(2)-C-X(6)-[WA]-X(9)-H-X{4)-[PRSD]-X-C.X(2)-{UVMA]. 

NAME: WAP-type 'four-disulfide core' domain signature. 
CONSENSUS: C-x-{C}.[DN]-x(2K-x(5)-C-C. 

NAME: Phorbol esters / diacylglycerol binding domain. 

CONSENSUS: H-x-{LIVMFYW]-x(8. 1 l)-C-x(2)-C-x(3)-[LIVMFC]-x(5, lO)-C-x(2)-C-x(4MHD]- 
CONSENSUS: x(2K-x(5,9)-C. 

NAME: C2 domain signature. 

CONSENSUS: [ACG]-x(2)-L-x(2,3).D-x(l.2)-[NGSTUFl-[GTMR]-x-{STAP].I>-[PA]-lFY]. 
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NAME: C2-domain profile. 
NAME: CAP-Gly domain signature. 

CONSENSUS: G-x(8,10)-|JYW]-x^-[LIVM]-x4LIVM^ 
CONSENSUS: x(2)-fl-Y]-F. 

NAME: Ly-6 / u-PAR domain signature. 

CONSENSUS: [EQR]-C-[LIVMFYAH>x-C-x(5.8H:-x(3.8>-CEDNQSTVl-C.{CJ-x(5K- 
CONSENSUS: x(12,24)-C. 

NAME: MAM domain signature. 

CONSENSUS: G-x- [UVMFY](2)-x(3)- lSTA|-x( 10.11 )-[LV]-x(4)-[LrVMF]-x(6,7>-C-( LI VM]-x- 
CONSENSUS: F-x-[UVMFY>x(3)-[GSC]. 

NAME: MAM domain profile. 

NAME: PH domain profile. 

NAME: Phosphotyrosine interaction domain (PID) profile. 
NAME: Src homology 2 (SH2) domain profile. 
NAME: Src homology 3 (SH3) domain profile. 
NAME: VWFC domain signature. 

CONSENSUS: C-x(2,3)-C-x-C-x(6,14)-C-x(3,4)-C-x(2,10)-C-x(9,l6)-C-C-x(2,4)-C. 
NAME: WW/rsp5/WWP domain signature. 

CONSENSUS: W-x<9, 1 1 )-[VFYHFY Wl-x(6,7)-[GSTNE]-[GSTQCR]-[FYW]- X (2>P. 
NAME: WW/rsp5/WWP domain profile. 
NAME: ZP domain signature. 

CONSENSUS: (UVMFYW]-x(7MSTAPDNLl-x(3)-[LrV^ 
CONSENSUS: [LIVMrT^x-[ST]-[PSL]-xa4)-{DENS] 
CONSENSUS: C. 

NAME: S-layer homology domain signature. 

CONSENSUS: [LVFYT]-x-[DA]-x(2,5)-[DNGSATPHY]-[WYFPDAl-x(4)-[LIVl.x(2)-[GTALVl- 
CONSENSUS: x(4.6)-tUVFV'C]-xa)^-x-[PGSTA]-x(2,3)-[MFYA]-x-[PGAV]-x(3.10)-[UV^ 
CONSENSUS: [STKR]-[RY]-x-[E<M-x-lSTAUVM]. 

NAME: , Homeobox , domain signature. 

CONSENSUS: [LIVMFYG3-[ASLVR].x(2)-[LrWSTACNl-x-[LIVM]-x(4)-[LIVl-{RKNQESTAril. 
CONSENSUS: [UVFSTNKHl-W-[FYVC]-x-[NDQTAH]-x(5>[RKNAIMW]. 

NAME: 'Homeobox* domain profile. 

NAME: 'Homeobox' antenna pedia-type protein signamre. 
CONSENSUS: [LIVMFE]-[FY]-P-W-M-fKRQTA]. 

NAME: 'Homeobox* engrailed-type protein signature. 
CONSENSUS: L-M-A-Q-G-L-Y-N. 

NAME: 'Paired box' domain signature. 
CONSENSUS: R-P-C-x(l 1K-V-S. 

NAME: 'POU' domain signature I. 

CONSENSUS: [RKQ]-R-[UM]-x-[Lr^-G-[UVMFYl-x.Q-x-tDNQ].V^3. 
NAME: 'POU' domain signature 2. 

CONSENSUS: S-Q-tST>[TA}-I-[SC].R-F-E.x-{LSQ]-x-[LIl-lST]. 
NAME: Zinc finger. C2H2 type, domain. 

CONSENSUS: C-x(2 t 4>C-x(3)-{LIVMFYWC]-x(8).H.x(3.5)-H. 

NAME: Zinc finger, C3HC4 type (RING finger), signamre. 
CONSENSUS: C-x-H-x-[LIVMFY]-C-x(2)-C-[UVMYA] . 

NAME: Nuclear hormones receptors DNA-binding region signature. 
CONSENSUS: C-x(2K-x-[DE]-x(5>[HN]-[FY].x(4K:.x{2)^C-x(2).F.F.x-R. 

NAME: GATA-iype zinc finger domain. 

CONSENSUS: C-xKD^<-x(4,5V[STl-x(2>-W^R).!JRK)-x(3)-[GN].x(3.4K:-N-{AS}-C. 
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NAME: Poty(ADP-ribosc) polymerase zinc finger domain signature. 
CONSENSUS: C-[KR]-x-C-x(3)-I-x-K-x(3MRG]-x( 16,1 8)-W-[FYH]-H-x(2)-C . 

NAME: Poly(ADP : ribose) polymerase zinc finger domain profile. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain signature. 

CONSENSUS: [GASTPV]<:.x(2)<:.[RiaiSTACW]-x(2)-[RKHQ]-x(2)-C-x(5,12H:-x(2)-C-x(6.8). 
CONSENSUS: C. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain profile. 

NAME: Prokaryotic dksA/traR C4-rype zinc finger. 
CONSENSUS: C-[DES]-x-C-x(3)-I-x(3)«R-x(4)-P-x(4)-C-x(2)-C. 

NAME: Copper-fist domain signature. 

CONSENSUS: M-[LIVMF](3)-x(3)-K-[MY]-A-C-x(2)-C-I-[KR]-x-H-(KRJ-x(3K-x-H-x(8)- 
CONSENSUS: [KR]-x-{KR]-G-R-P. 

NAME: Copper fist DNA binding domain profile. 

NAME: Leucine zipper pattern. 
CONSENSUS: L-x(6)-L-x(6>l.-x(6)-L. 

NAME: bZIP transcription factors basic domain signature. 

CONSENSUS: (KR]-x(1.3)-[RKSAQ]-N-x(2)-[SAQ](2)-x-(RKTAENQl.x-R-x.[RKl. 

NAME: Myb DNA-btnding domain repeat signature 1. 
CONSENSUS: W-[STl-x(2>-E-(DE]-x<2MLIV]. 

NAME: Myb DNA-binding domain repeat signature 2. 

CONSENSUS: W-x(2MUHSAG]«x(4,5)-R-x(8)-[Y>^-x(3HLIVM]. 

NAME: Myc-type, * helix-loop-helix' dimerization domain signature. 

CONSENSUS: [DENSTAP]-K-[LrVMWAGSNl-{FYWCPHKR}-[LIVT)HLIV]-x(2)-lSTAVl-[LIVMSTAC]-x- 
CONSENSUS: [VMFYH]-[UVMTA]-{P}-{P}-(LIVMSR]. 

NAME: p53 tumor antigen signature. 
CONSENSUS: M-C-N-S-S-C-M-G-G-M-N-R-R. 

NAME: CBF-A/n£yB subunit signature. 

CONSENSUS: C-V-S-E-x-I-S-F-[LIVM]-T-{SG]-E-A-[SCJ-lDE]-[KRQ3^:. 
NAME: CBF-B/NF-YA subunit signature. 

CONSENSUS: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-R-A-K«L-E. 
NAME: 'Cold-shock* DNA-binding domain signature. 

CONSENSUS: [FYl-G-F-I-x(6,7>[DER]-[UVM]-F-x-H-x-[STKRJ-x-(LIVMFY]. 

NAME: CTF/NF-I signature. 

CONSENSUS: R-K-R-K-Y-F-K-K-H-E-K-R. 

NAME: Ets-domain signature 1. 

CONSENSUS: L-[FYW]-[QEDH].F-[LIl-[l-VQK]-x4LI]-L. 
NAME: Ets-domain signature 2. 

CONSENSUS: [RKH3-x(2)-M-x.Y-PENQJ-x-tUVM]-[STAG]-R-[STAG]-[LI]-R-x.Y. 

NAME: Ets-domain profile. 

NAME: Fork head domain signature 1 . 

CONSENSUS: [KR]-P-[PTQ]-[inrLVQH]-S-[Frl-x(2)-[LIVM]-x(3,4VtAC]-[LIM]. 

NAME: Fork head domain signature 2. 
CONSENSUS: W-[QKR]-[NS]-S-[LIV1-R-H. 

NAME: Fork head domain profile. 

NAME: HSF-type DNA-binding domain signature. 
CONSENSUS: I^x(3MFy}-K-H-x-N-x-[STAr^-S-F-[^ 
CONSENSUS: [UVM]. 

NAME: Tryptophan pentad repeat (IRF family) signature. 

CONSENSUS: W-x-(Drm3-x(5HUVFl-x-[IVl-P-W-x-H-x(9J0HDE]-x(2HUVFhF-[KR(M-x- 
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CONSENSUS: [WR]-A. 
NAME: LIM domain signature. 

CONSENSUS: C-x(2)-C-x(15,21)-[FYWHJ-H-x(2).[CHl-x(2)-C-x(2)-C-x(3)-tLIVMF]. 
NAME: LIM domain profile. 

NAME: NF-kappa-B/Rel/dorsal domain signature. 
CONSENSUS: F-R-Y-x-C-E-G. 

NAME: MADS-box domain signature. 

CONSENSUS: R-x-[RK]-x(5H-x.tDN]-x(3MKR]-x(2VT-CFYl-x-[RK](3)-x(2)-rLIVM]-x- 
CONSENSUS: K(2)-A-x-E-[LIVM]-[ST3-x-L-x(4)-tUVMJ-x-[LIVM3(3Vx(6>[LrVMFl-x(2)- 
CONSENSUS: [FY]. 

NAME: MADS-box domain profile. 

NAME: T-box domain signature 1 . 

CONSENSUS: L-W-x(2)-[Fq-x(3 t 4)-[NT3-E-M-{LIVl(2).T.x(2)-G-[RG}-[KRQ]. 
NAME: T-box domain signature 2. 

CONSENSUS: [LIVMYW].H-[PADHHDEN]-tGS]-x(3)-G-x(2)-W.M.x(3)-[IVA]-x-F. 
NAME: TEA domain signature. 

CONSENSUS: G-R-N-E-L-I-x(2>Y-I-x(3).[TC]-x(3)-R-T-[RJC](2)-Q-[LIVM]-S-S-H-[LIVM]. 
CONSENSUS: Q-V. 

NAME: Transcription factor TFIIB repeat signature. 

CONSENSUS: G-lKRJ-x(3)-[STAGN]-x-[LIVMYAJ-lGSTA](2)-[CSAV]-lLIVMl-[LIVMFY]-[UVMAl- 
CONSENSUS: [GSAJ-[STAC]. 

NAME: Transcription factor TFIID repeat signature. 

CONSENSUS: Y-x-P-x(2)-[IFl-x(2).[LIVM](2)-x-[KRH]-x(3)-P-[RKQ].x(3)-L.[LIVM}-F-x- 
CONSENSUS: [STN]-G-[KR]-[LIVM]-x(3)-G-[TAGLl-[KR)-x(7)-[AGC]-x(7) : [UVM]. 

NAME: TFIIS zinc ribbon domain signature. 

CONSENSUS: C-x(2)-C-x(9)-[LIVMQSAR]-[QH].[STQLJ-[RA][SACR].x-[DE]-[DET].lPGSEA]- 
CONSENSUS: x(6)-C-x(2,5>C-x(3HFW]. 

NAME: TSC-22 / dip / bun family signature. 
CONSENSUS: M-D-L-V-K-x-H-L-x(2)-A-V.R-E-E-V-E. 

NAME: Prokaryotic transcription elongation factors signature 1. 

CONSENSUS: [ST]-x(2MGS]-x(3)-[U]-x(2VE-L-x(2)-L-x(3,4)-R-x(2)-[IVl-x(3).[LIV]. 
CONSENSUS: x(6)-G-D-x(2)-E-N-[GSA]-x-Y. 

NAME: Prokaryotic transcription elongation factors signature 2. 

CONSENSUS: S-x(2)-S-P-[LIVMl.[AG]-x-[SAG].[LIVM]-(LIVMYl-x(4)-[DG]-[DE}. 

NAME: DEAD-box subfamily ATP-dependent helicascs signature. 
CONSENSUS: [LIYMFl(2)-D-E-A-D-(RKEN]-x-[LIVMFYGSTN]. 

NAME: DEAH-box subfamily ATP-dependent helicases signature. 
CONSENSUS: [GSAH}-x-[LIVMFl(3)-D-E.[ALrV]-H-[NECR]. 

NAME: Eukaryotic putative RNA-binding region RNP-I signature. 
CONSENSUS: [RK]-G-{EDRKHPCG}-[AGSCn-{FYl-[LIVA]-x-[FYLM]. 

NAME: Fibrillarin signature. 

CONSENSUS: [GSTl-(LIVMAP3-V.Y-A-[IV]-E-[FY).[SA]-x-R-x(2).R-[DE]. 
NAME: MCM family signature. 

CONSENSUS: G-(IVTJ-[LVAq(2)-CIVTI.r>[DE]-[FL]-[DNST]. 
NAME: MCM family domain. 
NAME: XPA protein signature 1 . 

CONSENSUS: C.x-[DE]-C-x(3HLIVMF)-x(l,2).r>x(2)-L-x(3>.F-x(4)-C-x(2)-C. 
NAME: XPA protein signature 2. 

CONSENSUS: [UVM](2)-T.{KR]-T-E-x-K.x-PE]-Y-[UVMF]a)-x-r>x.{DE]. 
NAME: XPG protein signature 1 . 

CONSENSUS: (Vrj-[KRE]-P-x-[FYlL]-V-F-D-G-xC2)-IPIL}-x-[LVq-K. 
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NAME: XPG protein signature 2. 

CONSENSUS: [GS]-[LIVM]-[PER]-CFYS]-[LIVM]-x.A-P-x-E-A-[DE]-[PAS]-{QSl-[CLMl. 
NAME: Bacterial regulatory proteins, araC family signature. 

CONSENSUS: [I«Q]-rLIVMA]-x(2).[GSTAU\n-{FYWPGDN}-x(2)-[LIVMSAl-x(4 t 9)-[LIVMF]- 
CONSENSUS: x(2)-[LIVMSTAl-[GSTACIL|-x(3)-tGANQRFl-(LIVMFYl-x(4.5).[LFYl.x(3>- 
CONSENSUS: [FYIVA]-{FYWHCM}.x(3)-tGSADENQKR]-x-[NSTAPKL]-[PARL]. 

NAME: Bacterial regulatory proteins, araC family DNA-binding domain profile. 

NAME: Bacterial regulatory proteins, arsR family signature. 
CONSENSUS: C-x(2)-D-[LIVMl-x(6MST)-x(4)-S-[HYRHHQ] 

NAME: Bacterial regulatory proteins. asnC family signature. 

CONSENSUS: [GSTAP]-x(2)-[DNEA]-[LIVMl.lGSAl-x(2MLIVMFY]-[GN)-[LIVMSTl-|ST}-x(6)-R- 
CONSENSUS: [LVT]-x(2)-[LIVM]-x(3)-G. 

NAME: Bacterial regulatory proteins, crp family signature. 

CONSENSUS: [LrVMl-[STAG]-[RHNW].x(2)-(LIM>[GAJ-x-lLIVMFYA]*[LIVSC]-[GA)-x-[STACNl- 
CONSENSUS: x(2)-[MSTl-x-[GSTN]-R-x-[LIVMF]-xaMLIVMF]. 

NAME: Bacterial regulatory proteins. deoR family signature. 

CONSENSUS: R-x(3)-fLIVM]-x(3).[LIVMl.x(16.17)-[STA].x(2)-T.[LIVMA]-[RHl-[KRNA]-D- 
CONSENSUS: [LIVMF1. 

NAME: Bacterial regulatory proteins, gntR family signature. 
CONSENSUS: [UVAPKRHPILV^x-[E<FIVMR^x(2ML^ 
CONSENSUS: [DNGSTK]-rRGTLV]-x-[STAIVP]-[LIVAhxW 

NAME: Bacterial regulatory proteins. iclR family signature. 

CONSENSUS: [GA]-x(3)-lDS]-x(2).E.x(6)-[CSA]-[LIV\q-[GSA].x(2)-[LIVMl-[FYH]-tDN]. 
NAME: Bacterial regulatory proteins, lad family signature. 

CONSENSUS: [UVM]-x-{DE]-[UVMl-A-x(2)-[STAGV3-x-V-[GSTPl-x(2).[STAGl-[LIVMA].x(2)- 
CONSENSUS: [UVMFYANHLIVMC]. 

NAME: Bacterial regulatory proteins, luxR family signature. 
CONSENSUS: [GDC]-x(2)-[NSTAVY]-x(2)-flVHCSTA]-x(2MLI^ 
CONSENSUS: [NSn-[LIVM]-x(5)-[NRHSAMUVMSTAl-x(2)-[KR]. 

NAME: Bacterial regulatory proteins, lysR family signature. 

CONSENSUS: [NQKRHSTAG]-[LIVMFn'AJ-x(2)-lSTAGLV]-[STAG].x(4)-[LIVMYCTQR]-[PSTANLVER]> 
CONSENSUS: x.[PSTAGQV].[PSTAGNVMF)-[LIVMFA]-[STAGH]-x(2)-[LIVMFl.x(2)-[LIVMFW3- 
CONSENSUS: [RKEAV]-x(2)-(UVMFYNTAEl-x(3)-[LlMVT]. 

NAME: Bacterial regulatory proteins, marR family signature. 

CONSENSUS: [STNAJ-lLlA]-x-[RNGS]-x(4)-[U4].[EIV]-x(2)-[GES]-[LnniVl-(LIVq-x(7)- 
CONSENSUS: [DN)-[RKQG]-[RK]-x(6)-T-x(2)-[GA]. 

NAME: Bacterial regulatory proteins, merR family signature. 

CONSENSUS: [GSA]-x-[LlVMFA].[ASM]-x(2>.[STACLiVl-[GSDENQR]-lLIVC]-[STANHK]-x(3)- 
CONSENSUS: [LIVM]-[RHF]-x-[YWl-tDEQ]-x(2,3HGHDNQ]-[LrVMF](2). 

NAME: Bacterial regulatory proteins, tetR family signature. 
CONSENSUS: G-nJVMFYSl-x(2,3)-[TS]-[UVMTM^^ 

CONSENSUS: [GPAR]-x-[LIVMFl.[FYSTl-x-[HFY]-[F^.x-[DNST]^K-x(2)"[LIVMl. 

NAME: Transcriptional aimterminators bglG family signature. 
CONSENSUS: [ST]-x-H-x(2)-[FA](2)-[LIVM]-[EQK].R-x(2)-{QNK]. 

NAME: Sigma-54 factors family signature 1 . 
CONSENSUS: P-lUVM]-x-[LIV\n-x(2MLIVM]^ 

NAME: Sigma-54 factors family signature 2. 
CONSENSUS: R-R-T-[IV] -[ATJ-K- Y-R . 

NAME: Sigma-54 factors family profile. 

NAME: Sigma-70 factors family signature 1. 
CONSENSUS: rpE]-[U\Wn(2MHEQSl-x-G-x-[UV^ 

' NAME: Sigma-70 factors family signature 2. 
CONSENSUS: (SThn-x(2MDEQ]-[UVMHGAS]-x(4ML[VM 
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CONSENSUS: [UVMAl-[EQH]-x(3>-[LIVMFWl-x(2)-[LIVM]. 

NAME: Sigma-70 factors ECF subfamily signature. 
CONSENSUS: [STAIVHPQDELHDE]-[UVHI^ 

CONSENSUS: [GSTAIVJ.[LIMFYWQ]-x(12,14)-[STAP1-[FYWl-[LIF]-x(2)-[TV]. 

NAME: Sigma-54 interaction domain ATP-binding region A signature. 
CONSENSUS: [LIVMFYl(3)-x.G-IDEQJ-tSTE]-G.[STAV]-G-K.x(2)-(LIVMFY]. 

NAME: Sigma-54 interaction domain ATP-binding region B signature. 

CONSENSUS: [GS]-x-[LIVMF]-x(2)-A-[DNEQASHl-[GNEK]-G-[STIM]-[LlVMFY](3)-[DE]-tEK]- 
CONSENSUS: [LIVM]. 

NAME: Sigma-54 interaction domain C-terminal pan signature. 
CONSENSUS: [FYWl-P-[GS]-N-[LIVM]-R-[EQ]-L-x-[NHATl. 

NAME: Sigma-54 interaction domain profile. 

NAME: Single-strand binding protein family signature 1. 

CONSENSUS: [UVMF]-[NSTl-[KRTl-[UVM]-x-[LIVMFl(2)-G-[NHRKl-[LI\^]-IGSTl-x-[DET]. 

NAME: Single-strand binding protein family signature 2. 

CONSENSUS: T-x-W-[H Yl-{RNS]-[UVM]-x-[UVMFl-[FYl-(NGKR] 

NAME: Bacterial histone-like DNA-binding proteins signature. 

CONSENSUS: (GSK]-F-x(2)-[LIVMF]-x(4).[RKEQA3-x(2)-[RSTl-x-[GA]-x-{KN]-P-x-T. 
NAME: Dps protein family signature 1. 

CONSENSUS: H-[FW]-x-[LIVMJ-x-G-x(5)-[LVl-H-x(3)-[DE]. 
NAME: Dps protein family signature. 2. 

CONSENSUS: [LIVMFY]-[DH]-x-[LIVM]-|GA].E-R-x(3)-{LIF]-[GDN]-x(2)-[PA]. 

NAME: DNA repair protein radC family signature. 
CONSENSUS: H-N-H-P-S-G. 

NAME: recA signature. 

CONSENSUS: A-L-[KR]-[IFl-[FYl-[STA]-[STAD|-tLIVMQ]-R. 
NAME: RecF protein signature 1. 

CONSENSUS: P.[ED]-x(3)-fUVM](2)-x-G-[GSAD]-P-x(2>-R-R-x-[FY]-[LIVM]-D. 
NAME: RecF protein signature 2. 

CONSENSUS: (LIVMFY](2)-x-D-x(2.3>-[SA]-[EH]-L-D-x(2HKRH]-x(3>-L. 
NAME: RecR protein signature. 

CONSENSUS: C-x(2)-C-x(3)-[STl-x(4)-C-x-t-C«x(4)-R. 

NAME: Histone H2A signature. 
CONSENSUS: [ACJ-G-L-x-F-P-V. 

NAME: Histone H2B signature. 

CONSENSUS: [KR]-E-[LIVM]-[EQ].T-x(2)-[KR]-x.[LIVM](2)-x-[PAG]-[DE]-L-x-{KR]-H-A- 
CONSENSUS: [UVM]-[STA]-E-G. 

NAME: Histone H3 signature 1. 
CONSENSUS: K-A-P-R-K-Q-L. 

NAME: Histone H3 signature 2. 

CONSENSUS: P-F-x-[RA]-L-[V A]-[KRQJ-[DEG]-[IV] . 

NAME: Histone H4 signature. 
CONSENSUS: G-A-K-R-H. 

NAME: HMG1/2 signature. 

CONSENSUS: [r^S-[KR]-K-C-S-[EK]-R-W-K-T-M. 

NAME: HMG-I and HMG-Y DNA-binding domain (A+T-hook). 
CONSENSUS: [ATl-x(U)-[RK](2HGP)-R-G-R-P-{RKl-x. 

NAME: HMG14 and HMG17 signature. 
CONSENSUS: R-R-S-A-R-L-S-A-[RK]-P. 

NAME: Bromodomain signature. 
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CONSENSUS: [STANVF)-x(2)-F-x(4)-[DNS]-x(5,7)-|DENQTF3-Y-[HFY]-x(2MLIVMFY]-x(3)- 
CONSENSUS: [UVM]-x(4)-[UVM]-x(6,8)-Y-x(U.13HLIVM]-xa)-N-tSACF3-x(2)-[FY]. 

NAME: Bromodomain profile. 

NAME: Chromo domain signature. 

CONSENSUS: |FYLJ-xHLIVMC]-[KR]-W-x4GDNR]-tnnVLE]-x(5,6V[ST]-W.[ES)-lPSTDN]-x(3)- 
CONSENSUS: [LIVMC]. 

NAME: Chromo and chromo shadow domain profile. 

NAME: Regulator of chromosome condensation (RCC 1) signature 1 . 
CONSENSUS: G-x-N-D-x(2)-[AV]-L-G-R-x-T. 

NAME: Regulator of chromosome condensation (RCC1) signature 2. 

CONSENSUS: [LIVMFA]-[STAGC](2)-G-x(2>.H-|STAGLI].[LIVMFA]-x.[LIVM]. 

NAME: Protornine PI signature. 

CONSENSUS: [AV]-R-[NFY]-R-x(2,3MSTJ-x-S-x-S. 

NAME: Nuclear transition protein t signature. 
CONSENSUS: S-K-R-K-Y-R-K. 

NAME: Nuclear transition protein 2 signature 1 . 
CONSENSUS: H-x(3)-H-S-[NSl-S-x-P-Q-S. 

NAME: Nuclear transition protein 2 signature 2. 
CONSENSUS: K-x T R-K-x(2)-E-G-K-x(2>-K-[KR]-K. 

NAME: Ribosomal protein LI signature. 
CONSENSUS: [IM]-x(2>[UVA]-x(2,3)-[UVM^ 
CONSENSUS: [LMF]-P-[DENSTK]. 

NAME: Ribosomal protein L2 signature. 

CONSENSUS: P-x(2>R-G-[STAIV](2)-x-N-[APK]-x-[DE]. 

NAME: Ribosomal protein L3 signature. 

CONSENSUS: [FL]-x(6)-[DN]-xaV[AGS]-x-[STl-x-G-[KRHl-G-x(2)-G-x(3>-R. 
NAME: Ribosomal protein L5 signature. 

CONSENSUS: [LI\^-x(2MLIVMMSTACHGEHQV]-x(2)-[^ 
CONSENSUS: x-[STA]. 

NAME: Ribosomal protein L6 signature k. 
CONSENSUS: [PS]-[DENS]-x-Y-K-[GA]-K-G-[LIVM]. 

NAME: Ribosomal protein L6 signature 2. 

CONSENSUS: <^x(3)-[LTVM]-x(2).[KR]-x(2)-R-x-F-x-D^-[UVMJ-Y-[LIVM]-x(2)-[KR]. 
NAME: Ribosomal protein L9 signature. 

CONSENSUS: G-x(2>-[GN]-x(4)-V-x(2)-G-[FY]-x(2)-N-[FY]-L-x(5)-tGAl-x(3>-[STNl. 
NAME: Ribosomal protein UO signature. 

CONSENSUS: [DEH]-x(2^[GS]-[UVMr1.1STr^-[VAl.x-{DEQKl.[LIVMAl.x(2>[U^ 

NAME: Ribosomal protein LI 1 signature. 
CONSENSUS: [RKN]-x-[UVNfl-x^-[ST]-x(2)-^ 

NAME: Ribosomal protein LI 3 signature. 

CONSENSUS: [UVM]-[KRVHGK]-M-[LIV>[PS]-x(4,5M^^ 

CONSENSUS: [LFY]-x-[GDN]. 

NAME: Ribosomal protein L14 signature. 

CONSENSUS: [GA3-O^(3)-x(9,10)-(DNSl-G-x(4).[F^-x(2HNT]-x(2).V.rLIVl. 

NAME: Ribosomal protein LIS signature. 

CONSENSUS: K-[LIVM](2HGAL]-x-[GT)-x-CUVMAM^ 

CONSENSUS: [LIVMFCl-[STJ-x(2).A-x(3)-[LIVM].x(3)-G. 

NAME: Ribosomal protein L16 signature 1 . 

CONSENSUS: [KRl-R-x-[GSAq-^KQVA^[lJVM^W-[^^ 

NAME: Ribosomal protein L16 signature 2. 
CONSENSUS: R-M-G-x-lGR]-K-G-x(4MFWKR]. 
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NAME: Ribosomai protein L17 signature. 

CONSENSUS: ^x-{STMGT^x(2)-^KR^x-K-x<6^[DE^x-[l^ 

NAME: Ribosomai protein L19 signature. 

CONSENSUS: [RT]-[KRSVY]-[GSA]-x-V-tRS]-[KR]-[SA]-K-L-Y-Y-L-R. 
NAME: Ribosomai protein L20 signature. 

CONSENSUS: K-x(3)-[KRC]-x-lLIVM]-W-[IV]-tSTNALV]-R-[LiVM]-N-x(3)-[RKH]. 
NAME: Ribosomai protein L21 signature. 

CONSENSUS: [IVTl-x(3MKR]-x(3)-[KRQ]-K-x(6).G-[HFJ-R-[RQ}-x<2)-T. 
NAME: Ribosomai protein 122 signature. 

CONSENSUS: (RKQN]-x(4MRHHGAS>x-(HKRQS]-x(9>(H^ 
NAME: Ribosomai protein L23 signature. 

CONSENSUS: {RK](2)-{AM]-[IVr^-aV]-[RKT]-L4STANQK]-x(7).[LIVMFT]. 
NAME: Ribosomai protein L24 signature. 

CONSENSUS: [GDEN]-D-x-V-x-[IV]-[LIVMAl-x-G-x(2)-[KA]-[GNl-x(2,3HGAJ.x-[IV]. 

NAME: Ribosomai protein L27 signature. 
CONSENSUS: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G. 

NAME: Ribosomai protein L29 signature. 

CONSENSUS: [KNQS1-[PSTL]-x(2).{LIMFA]^KRGSANhx-[LIVYSTA]-[KR]-lKRH|-lDESTANRL]- 
CONSENSUS: [LIV]-A-[KRCQVT1-[LIVMA]. 

NAME: Ribosomai protein L30 signature. 
CONSENSUS: [TVTMUVM]-x(2MLr1-x-tU>x-[^ 
CONSENSUS: x(10)-[LMSHLIV]-x(2)-[LIVA]-x(2)-(LMFYl-fIVT|. 

NAME: Ribosomai protein L31 signature. 

CONSENSUS: H-P-F[FY]-[TIJ-x(9)-G-R-[AV]-x-[KR]. 

NAME: Ribosomai protein L33 signature. 

CONSENSUS: Y-x-[ST]-x-[KR)-[NSJ.x(4MPAT].x(l,2HLIVM]-[EA]-x(2)-K-lFY]-[CSD]. 
NAME: Ribosomai protein L34 signature. 

CONSENSUS: K-[RG]-T-(FYWL]-(EQS)-x(5)-(KRHS]-x(4 t 5)-G-F-x(2VR. 
NAME: Ribosomai protein L35 signature. 

CONSENSUS: [LIVM]-K-[TVl.x(2)-tGSA]-[SAIL]^-K-R-[IJVMFYl-[KRL]. 
NAME: Ribosomai protein L36 signature. 

CONSENSUS: C-x(2)-C-x(2)-[UVMl-x.R-x(3)-[LIVMN]-x-[LIVMl-x-C-x(3 ? 4)-[KR]-H-x-Q-x-Q. 
NAME: Ribosomai protein Lie signature. 

CONSENSUS: N-x(3V[KR]-x(2).A-{UVT].x.S-A-[UV].x-A-[ST]-{SGA]-x(7)-tRK].G-H. 

NAME: Ribosomai protein L6e signature. 

CONSENSUS: N-x(2)-P-L-R-R-x(4)-[FY]-V^A-T-S-x-K. 

NAME: Ribosomai protein L7Ac signature. 

CONSENSUS: [CA]-x(4>-[IV]-P-[FYl-x(2)-[LIVM]-x-[GSQ]-[KRQ]-x(2>-L-G. 

NAME: Ribosomai protein LlOe signature. 

CONSENSUS: R-x-A-{FYW]-G-K-[PA]-x-G-x(2)-A-R-V. 

NAME: Ribosomai protein L13e signature. 

CONSENSUS: (KR]-Y-x(2)-K.[LIVMl-R-(STA]-G-ntR]-G-F-[STl-L-x-E. 

NAME: Ribosomai protein L15e signature. 

CONSENSUS: [DE]-[KR]-A-R-x4--G-[FY]-x-fSAP]-x<2)^ 

NAME: Ribosomai protein Ll8e signature. 
CONSENSUS: flOlE]-x-L-x(2MPS]-ntRM2)- 
CONSENSUS: [LIVM]. 

NAME: Ribosomai protein L19e signature. 

CONSENSUS: R-x-[KR]-x(5V[KR]-x<3V[iaWl-x(2)^-x^-x-R-x^-x(3)'A-R-x(3>[KQ]- 
CONSENSUS: x(2>-W-x(7)-R-x{2)-L-x(3>-R. 
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NAME: Ribosornal protein L21e signature. 

CONSENSUS: G-[DE]-x-V-x(l0)-[GV]-x(2)-[FYH]-xa)-[FY]-x-G-x-T-G. 
NAME: Ribosornal protein L24e signature. 

CONSENSUS: [FY]-x-[GS]-x(2)-[TVl-x-P-G-x-G-x(2)-[FYV]-x.[KRHEl-x-D. 

NAME: Ribosornal protein L27e signature. 
CONSENSUS: G-K-N-x-W-F-F-x-K-L-R-F> . 

NAME: Ribosornal protein L30e signature 1. 
CONSENSUS: [STAJ-x(5)-(^x-[QKR]-x(2)-[U^ 

NAME: Ribosornal protein L30e signature 2. 

CONSENSUS: [DE]-L-G^STAJ.x(2)^-[KR]-x(6>[LIVMl-x-[LIVMl-x-(DEN]-x-G. 
NAME: Ribosornal protein L31e signature. 

CONSENSUS: V-(KR]-[LIVM]-x(3)-[UVM].N-x.[AKl-x-W.x.[KR3-G. 
NAME: Ribosornal protein L32e signature. 

CONSENSUS: F-x-R-x(4)-lKR]-x(2)-[KR].[LIVM]-x(3>-W-R-[KR].x(2H3. 

NAME: Ribosornal protein L34e signature. 
CONSENSUS: Y-x-[ST)-x-S-[NY]-x(5MKR]-T-P-G. 

NAME: Ribosornal protein L35Ae signature. 

CONSENSUS: G-K-[LIVM]-x-R-x-H-G-x(2)^-x-V-x«A-x-F-x(3)-[LI)-P. 
NAME: Ribosornal protein L36e signature. 

CONSENSUS: P-Y-E-fKRl-R-x-ILIVMl-[DE]-[UVMl(2VlKR|. 
NAME: • Ribosornal protein L37e signature. 

CONSENSUS: G-T-x-[SAl-x-G-x.lKRl-x(3)-[ST)-x(0,l)-H.x(2)<:-x-R-C.G. 
NAME: Ribosornal protein L39e signature. 

CONSENSUS: tKRAl-T-x(3>-[UVMJ-[KRQFl-x-[NHS]-x(3)-R-[NHY}-W-R-R. 

NAME: Ribosornal protein L44e signature. " 
CONSENSUS: K-x-fnn-K-K-x(2)-L-[KR]-x(2)-C. 

NAME: Ribosornal protein S2 signature 1. 

CONSENSUS: rLIVMFA^x<2)-[UVMFIr](2)-x-[STA^^ 

NAME: Ribosornal protein S2 signature 2. 

CONSENSUS: P-x(2)-[LIVMF](2HUVMS]-x-[GDNl-x(3)-[DENL].x(3)-[LIVM]-x-E-x(4)* 
CONSENSUS: [GNQKRHHLIVMHAP). 

NAME: Ribosornal protein S3 signature. 

CONSENSUS: [GSTA]-[KR]-x(6>-G-x-[LIVMT]-xaHNQSCHl-x(l,3)-[LIVFCAl-x(3)-[LIVl- 
CONSENSUS: [DENQ]-x(7)-rLMT]-x(2)-G-x(2)-G. 

NAME: Ribosornal protein S4 signature. 

CONSENSUS: [LIVM]-[DE^x-R.L.x(3)-[LIVMC]-[VMFYH<a-^KRT3-x(3)-[STAGCFl-x4ST]-x(3V 
CONSENSUS: [SAIHKR]-x-lLIVMF](2). 

NAME: Ribosornal protein S5 signature. 

CONSENSUS: G«{ICRQ]-x(3>-[FYl-x-[ACV]-xa)-[LIVMA]-[UVM]-[AG]-PN]-x(2)-G-x. 
CONSENSUS: [LIVM]-G-x-[SAG]-x(5.6HDEQ]-[l.IVMl-x(2)-A.fLIVMF]. 

NAME: Ribosornal protein S6 signature. 

CONSENSUS: G-x-[KRCH D ENQRH]-L-[SA]-Y-x-HKRNSA] . 
NAME: Ribosornal protein S7 signature. 

CONSENSUS: [DENSK]-x-[LIVME^x(3)-[LIVMFn(2)^^ 
CONSENSUS: x(2>[STA]. 

NAME: Ribosornal protein S8 signature. 

CONSENSUS: {GE]-x(2V[UV](2HSTYl-T-x(2)^^IVM](2)-x(4)-[AG}-n«HAYIl. 
NAME: Ribosornal protein S9 signature. 

CONSENSUS: G-G-G-x(2)-[GSA]-Q-x(2HSA]-x(3)-[GSAl-x-[GSTAVl-{KR].(GSALHUF). 
NAME: Ribosornal protein S10 signature. 

CONSENSUS: [AVl-x(3)-^GDNSR^lLIVMSTA]-x(3)^.P-[UVM]-x-[L^VMl-P-T. 
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NAME: Ribosomal protein SI 1 signature. 

CONSENSUS: [L!VMn-x-[GSTACJ-(LIVMF]-x(2HGSTAL]-x(0,l)-[GSN3-[LIVMFl-x-[LIVM]- 
CONSENSUS: x<4)-[DENl-x-T-P-x4PAHSTCH]-[DN]. 

NAME: Ribosomal protein S12 signature. 
CONSENSUS: [RKJ-x-P-N-S-[ARJ-x-R. 

NAME: Ribosomal protein S13 signature. 

CONSENSUS: [KRQS]-G-x-R-H-xa>-[GSNH]-x(2>[LIVMC]-R*G-Q. 
NAME: Ribosomal protein S14 signature. 

CONSENSUS: [RP]-x(0,l)-C-x(l l,12HUVMF]-x-[LIVMn-(SC].[RG].x(3)-[RN]. 
NAME: Ribosomal protein SI 5 signature. 

CONSENSUS: [LIVM]-x(2)-H.ILIVMFY]-x(5VD-x(2)-[SAGNl-x(3)-tLF]-x(9V[LIVM]-x(2)- 
CONSENSUS: [FY], 

NAME: Ribosomal protein S16 signature. 

CONSENSUS: [UVMTl-x.[LIVM]-fKR]-L-rSTAK]-R-x.G-[AKRJ. 
NAME: Ribosomal protein S17 signature. 

CONSENSUS: G-D-x-[UV]-x-[LIVA]-x-[QEK]-x-[RK]-P-[Lrvl-S. 
NAME: Ribosomal protein S18 signature; 

CONSENSUS: [IV]-lDYl-Y-x(2)-[LIVMTJ-x(2)-lLIVM]-x(2)-[FYT]-lLIVM]-[ST3-[DERP].x- 
CONSENSUS: (GY]-K-[LIVM]-x(3)-R-[LIVMAS]. 

NAME: Ribosomal protein S19 signature. 

CONSENSUS: [STDNQ]-G-[KRQM]-x(6>-[LIVM]-x(4)-[LIVMl-[GSD]-x(2)-[LFl-[GASl-[DEJ-F- 
CONSENSUS: x(2)-[ST). 

NAME: Ribosomal protein S21 signature. 

CONSENSUS: [DEl-x-A-lLY]-[KR]-R-F-K-[KR]-x(3HKR]. 

NAME: Ribosomal protein S3Ae signature. 

CONSENSUS: [LIV]-x-[GH]-R-[IV]-x-E-x-[SC]-L-x-D-L. 

NAME: Ribosomal protein S4e signature. 

CONSENSUS: H-x-K-R4LIVMMSAN]-x*P-x(2)-W-x-[LIVM]-x-[KR]. 

NAME: Ribosomal protein S6e signature. 

CONSENSUS : [LIVMHSTAMR]-G-G-x-D-x(2)-G-x.P-M . 

NAME: Ribosomal protein S7e signature. 

CONSENSUS: [KR]-L-x-R-E-L-E-K-K-F-[SAP)-x-[KR]-H. 

NAME: Ribosomal protein S8e signature. 

CONSENSUS: R-x(2VT-G.[GA].x(5)-[HRJ>K-[KR3-x-K-x-E-[LMJ-G. 

NAME: Ribosomal protein S12e signature. 

CONSENSUS: A-L-[KRQP]-x-V-L-x(2MSA]-x(3)-[DN]-G-L. 

NAME: Ribosomal protein S17e signature. 

CONSENSUS: A-x-I-x-[Sn-K.x-L-R-N-lKRM-AG-[FY]-x-T-H. 
NAME: Ribosomal protein S19e signature. 

CONSENSUS: P-x(6)-[SAN]-x(2)-[LIVMAl-x-R-x-[ALiV]-[LV]-Q-x-L-[EQ]. 

NAME: Ribosomal protein S21e signature. 
CONSENSUS: L-Y-V-P-R-K-C-S-[SA]. 

NAME: Ribosomal protein S24e signature. 

CONSENSUS: [FA]-G-x(2)-[KRl-[STA]-x^-[FY]-[GA]-x.[UVM]-Y-tD^-{SN]. 

NAME: Ribosomal protein S26e signature. 
CONSENSUS: [YHK- V -S-CA-1-H. 

NAME: Ribosomal protein S27e signature. 

CONSENSUS: [QK]-C-x(2)-C^(6)-F-tGS3-^[PSA].x(5)-C-x(2)-C-lGS]-xa>-L-x(2)-P.x.G. 

NAME: Ribosomal protein S28e signature. 
CONSENSUS: E-[ST]-E-R-E-A-R-x-L. 

NAME: DNA mismatch repair proteins mutL / hexB / PMS 1 signature. 
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CONSENSUS: G-F-R-G-E-A-L. 

NAME: DNA mismatch repair proteins mutS family signature. 

CONSENSUS: [ST)-[UVM]-x.[l^M]-A-D-E-[LIVMY]-[Gq.[RKH]-G-[GST]-x(4)-G. 
NAME: mutT domain signature. 

CONSENSUS: G-x(5)-E-x(4)-[STAGCJ-[LIVMAC]-x.R-E-[LIVMFTl-x-E-E. 
NAME: DnaA protein signature. 

CONSENSUS: HGA]-x(2)-[l-IVMFl-[SGDNK3-x(0 I lHKR]*x-H-[STP]-[STV]-[UVM](2)-x- 
CONSENSUS: [SA]-x(2)-[KREHLIVM]. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 1. 
CONSENSUS: K-x-E-[LIV]-A-x-[DEHLlVMF]-G-[UVMF]. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 2. 
CONSENSUS: [KR]-tSAQ]-x-G-x-V-G-G-x-{LlVM]-x-[KRJ(2MLIVM](2). 

NAME: Zinc -containing alcohol dehydrogenases signature. 
CONSENSUS: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC]. 

NAME: Quinone oxidoreductase / zeta-cry stall in signature. 

CONSENSUS: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q.x(2)-[KR]. 
NAME: Iron-containing alcohol dehydrogenases signature t. 

CONSENSUS: [STALIVl-[LIVFl-x-[DEl-x(6,7)-P.x(4V[AUV]-x.[GSTl-x(2)-D-[TAIVM3- 
CONSENSUS: [LIVMF]-x(4>-E. 

NAME: Iron-containing alcohol dehydrogenases signature 2. 

CONSENSUS: [GSWl-x-[LIVTSACDJ-lGH]-x(2MGSAEl-[GSHYQ]-x-[LlVTP3-[GASTl-(GAS]-x(3). 
CONSENSUS: [LIVMT]-x-[HNSHGA]-x4GTAC] . 

NAME: Short-chain dehydrogenases/reductases family signature. 

CONSENSUS: [LIYSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K-{PC}-[SAGFRl- 
CONSENSUS: lUVMSTAGD]-x(2)-[LIVMFi^-x(3)-[LlVMFYWGAPTHQ]-[GSACQRHM]. 

NAME: Aldo/keto reductase family signature 1. 

CONSENSUS: G-[FYJ-R-lHSAL]-(LIVMF]-D-[STAGq-[AS]-x(5)-E-x(2V[LIVM]-G. 
NAME: Aldo/keto reductase family signature 2. 

CONSENSUS: [LIVMFY]-x(9)-{KREQ]-x-[LIVM]-G-[LIVMl-[SC]-N-[FY]. 
NAME: Aldo/keto reductase family putative active site signature. 

CONSENSUS: [LIVM]-tPAIV].[KRl-[ST]-x(4)-R-x(2)-[GSTAEQKl-[NSL]-x(2)-[LIVMFA]. 
NAME: Homoserine dehydrogenase signature. 

CONSENSUS: A-x(3)-G-[LIVMFYl-lSTAGl-x(2.3)-[DNS]-P-x(2)-D-tLIVM]-x-G-x-D-x(3)-K. 
NAME: NAD-dependent glycerol-3-phosphate dehydrogenase signature. 

CONSENSUS: G-[AT)-(LIVM]-K-[DN]-(LIVMK2)-A-x-[GAl-x-G-tLIVMF3-x-[DEl-G-[LIVMl-x- 
CONSENSUS: [LIVMFYW]-G-x-N. 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase signature I. 
CONSENSUS: (IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-r>x(3)-R-G. 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase signature 2. 
CONSENSUS: G-G-K-x(2HGSTE]-Y-R-x(2>-A. 

NAME: Mannitol dehydrogenases signature. 

CONSENSUS: [UVMY]-x-[FS]-x(2HSTAGCV].x-V-D-R-[IV3-x-[PS]. 
NAME: Histidinol dehydrogenase signature. 

CONSENSUS: I.D-x(2>-A-G-P-[ST)-E-(LIVSl-[LIVMA](3HAC]-x(3)-A-x(4HUVMl-[AV]. 
CONSENSUS: (SACL]-[DE]-[LIVMFq-[LIVMl-[SA]-x(2)-E-H. 

NAME: L-iactate dehydrogenase active site. 
CONSENSUS: [UVMA1-G-[EQ]-H-G-{DNV[ST1. 

NAME: D-isoraer specific 2-hydroxyacid dehydrogenases NAD-binding signature. 
CONSENSUS: [UVMAl-[AG]-[IVT]-{UVMFn-[AG]^^ 
CONSENSUS: [UViMT]-x(2HFYwCTO]-[DNSTK). 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 2. 
CONSENSUS: (UVMFi^A]-[UVr^q-xaMSACHDNQHRl^ 
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CONSENSUS: P-x(4)-[STNJ-x(2MLIVMF]-x-[GSDN]. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 3. 
CONSENSUS: [UtfFATCl-[KPQ)-x-[GSTDN]-x-^^ 
CONSENSUS: [UVH]-[LIVMC]-[DNV]. 

NAME: 3-hydroxyisobuty rate dehydrogenase signature. 
CONSENSUS: (LIVMFYl(2)-G-L-G-x-[MQ]-G-x-lPGS]-[MA]-[SA]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 1 . 
CONSENSUS: [RKH]-x(6)-D-x-M-G-x-N-x-[LIVMA] 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 2. 
CONSENSUS: [LIVM]-G-x-[UVM]-G-G-[AG]-T. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 3. 

CONSENSUS: A-lLIVM]-x-[STANl-x(2)-[LI]-x-[KRNQ]-[GSA]-H-[LM]-x-[FYLH]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile. 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature. 
. CONSENSUS: LDNEl-x(2)-[GA]-F-[LIVMFY]-x.[NTl-R-x(3)-[PAl-lLIVMFYl(2)-x(5)- 
CONSENSUS: [LIVMFYCTl-[LIVMFYJ-x(2MGV}. 

NAME: Malate dehydrogenase active site signature. 

CONSENSUS: [LrVM]-T-ITRKMN]-L-D-x(2)-R-[STA]-x(3)-{LIVMFY}. 
NAME: Malic enzymes signature. 

CONSENSUS: F-x-(DV].D-x(2)-G-T-[GSA}-x-[IVl.x-[LIVMA]-[GAST](2).[LIvTvlF](2). 
NAME: Isocirrate and isopropylmalate dehydrogenases signature. 

CONSENSUS: [NS)-[Ln^YT]-[FYDN]-G-[DNT]-[IMVY].x-[STGDN]-{DN]-x(2)-[SGAP]-x(3.4)-G- 
CONSENSUS: {STGJ-[LIVMPA]-G-[UVMF]. 

NAME: 6-phosphog!uconate dehydrogenase signature. 
CONSENSUS: [LIVM]-x-D-x(2MGAMNQS]«K-G-T-G-x-W. 

NAME: Glucose -6-phosphate dehydrogenase active site. 
CONSENSUS: D-H-Y-L-G-K-[EQK). 

NAME: IMP dehydrogenase / GMP reductase signature. 

CONSENSUS: [LIVM]-[RK]-[LIVM]-G-[LIVM]-G-x-G-S.tLIVM]-C-x-T. 

NAME: Bacterial quinoprotein dehydrogenases signature 1. 

CONSENSUS: pEN]-W-x(3)-G-(RK]-x(6)-[FYW]-S-x(4HLIVMl-N-x(2)-N-V.xa)-L-[RK]. 

NAME: Bacterial quinoprotein dehydrogenases signature 2. 

CONSENSUS: W-x(4)-Y-D-x(3)-[DN]-[LIVMFY](4Vx(2)-G-x(2)-[STA].P. 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active site. 
CONSENSUS: S-N-H-G-[AGJ-R-Q. 

NAME: GMC oxidoreductases signature 1. 
CONSENSUS: [GAHWa*>x-[LIV3-G<2MGST](2)^ 
CONSENSUS: [DNESH]. 

NAME: GMC oxidoreductases signature 2. 

CONSENSUS: [GS]-[PSTA]-x(2)-[ST3-P-x-[LrVMl(2)-x(2>-S-G.[UVMl-G. 
NAME: Eukaryotic molybdopterin oxidoreductases signature. 

CONSENSUS: [GA]-x(3>-[KRNQHT]-x< 1 M4)-[LIVMFYWS]-x(8>-[LIVMF]-x-C-x(2)-[DENl.R. 
CONSENSUS: x(2>-[DE]. 

NAME: Prokaryotic molybdopterin oxidoreductases signature 1 . 
CONSENSUS: [STAhq-x-[CH]-x(2JK-[CTAG]-[GSTVT^ 
CONSENSUS: [DENQKHT]. 

NAME: Prokaryotic molybdopterin oxidoreductases signature 2. 

CONSENSUS: [STA]-x-fSTAC](2)-x(2)-[STA]-D-(LIVMYl(2)-L-P-x-[STAq(2)-x(2)-E. 
NAME: Prokaryotic molybdopterin oxidoreductases signature 3. 

CONSENSUS: A-x(3)-[GDrn-!-MDNQrnC]-x-tDEA]-x.[LIVM]-x-tUVMq.x-[NS)-x(2>.[GSl- 
CONSENSUS: x(5>A-x-[UVMHST]. 
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NAME: Aldehyde dehydrogenases glutamic acid active site. 

CONSENSUS: [LIVMFGAl-E-[LIMSTAq-[GS]-G-[KNLMl-[SADNl-(TAPFV]. 

NAME: Aldehyde dehydrogenases cysteine active site. 

CONSENSUS: [FYLVA]-x(3)-G-[QE]-x-C-(LIVMGSTANC]-[AGCN].x-[GSTADNEKR]. 

NAME: Aspartate-semialdehyde dehydrogenase signature. 

CONSENSUS: [UVMl-{SADN].x(2)-C-x-R-tUVM]^)-lGSq-H-lSTA]. 

NAME: Glyceraldehyde 3-phosphate dehydrogenase active site. 
CONSENSUS: [ASV]-S-C-[NT]-T-x(2)-[LIM]. 

NAME: N-aceryl-gamma-glutamyl-phosphate reductase active site. 

CONSENSUS: [LIVM}-[GSA].x-P-G-C-[FY]4AVP]-T-[GA]-x(3)-[GTAC]-[LIVM]-x-P. 

NAME: Gamma-glutamyl phosphate reductase signature. 

CONSENSUS: V-x(5)-A-(LIV]-x-H-I-x(2)-lHYl.tGS]-[STJ-x-H.[ST]-tDE3.x-I. 

NAME: Dihydrodipicolinate reductase signature. 
CONSENSUS: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A. 

NAME: Dihydroorotate dehydrogenase signature 1. 

CONSENSUS: [GS]-x(4)-[GK]-[STA]-[IVSTA3-[GT3-x(3)-tNQR]-x.G-[NH3-x(2)-P-[RT]. 
NAME: Dihydroorotate dehydrogenase signature 2. 

CONSENSUS: [UV](2)-{GSA1-x-G-G-[IV]-x-[STGN]-x(3)-[ACV]-x(6)-G.A. 
NAME: Coproporohyrinogen III oxidase signature. 

CONSENSUS: K-x-W-C-x(2)-lFYHJ(3HUVM]-x-H-R-x-E-x-R-G-[LIVM]-G-G-[LIVM]-F-F-D. 

NAME: Fumarate reductase / succinate dehydrogenase FAD-binding site. 
CONSENSUS: R-[ST]-H-[ST]-x(2)-A-x-G-G. 

NAME: Acyl-CoA dehydrogenases signature 1 . 

CONSENSUS: [GAC]-[LrvTvIl.[ST]-E-x(2)-[GSAN]-G-[ST3-D-x(2)-[GSA]. 
NAME: Acyl-CoA dehydrogenases signature 2. 

CONSENSUS: [QDE]-x(2)-G-[GS]-x-G-[UVMFY]-x(2HDENl-x(4)-[KR]-x(3)-[DEN]. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature I. 
CONSENSUS: G-[LIVM]-P-x-E-x(3)-N.E-x(l,3)-R-V-A.x-[ST].P-x.[GSTl-V-x(2).L-x-(KRH]. 
CONSENSUS: x-G. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 2. 
CONSENSUS: [LIVMl(2)-G-[GA]-G-x-A-G-x(2V[SA]-x(3>[GA]-x-[SG]-[LIVM]-G-A.x.V- 
CONSENSUS: x(3)-D. 

NAME: Glu / Leu / Phe / Val dehydrogenases active site. 
CONSENSUS: [UV]-x(2)-G-G-[S AG]-K-x-(GV]-x(3MDNST]-[PL] . 

NAME: D-amino acid oxidases signature. 

CONSENSUS: (LIVM](2)-H-[NHA]-Y-G-x-EGSA](2>-x-G-x(5)-G-x-A. 

NAME: Pyridoxamine 5 '-phosphate oxidase signature. 
CONSENSUS: [UVF].E-F.W-[QHG]-x(4>-R-{LIVM}-H-[DNE]-R. 

NAME: Copper amine oxidase topaquinone signature. 

CONSENSUS: [UVM].[LIVMAl-[LIVMl-x(4)-T.x(2)-N-Y-[DEl-{YNl . 

NAME: Copper amine oxidase copper-binding site signature. 
CONSENSUS: T-x-G-x(2)-H-[LIVMF]-x(3)-E-[DE]-x-P. 

NAME: Lysyl oxidase putative copper-binding region signature. 
CONSENSUS: W-E-W-H-S-C-H-Q-H-Y-H. 

NAME: Delta l-pynoline-5-carboxylate reductase signature. 

CONSENSUS: [PAUn-x(2J)-[LIV]-x(3>-tUVMl-tSTACl-|STVl-x-[GAN]-G-x-T-x(2)-[AG]- 
CONSENSUS: [LIV]-x(2)-[LMF]-[DENQK]. 

NAME: Dihydrofolate reductase signature. 

CONSENSUS: [LVAGq-[UR-G-x<4MLIVMF]-P^ 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature I. 
CONSENSUS: [E<a-x-[EQiq-[UVM](2Vx(2>-[lJ^^ 
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CONSENSUS: Q-L-P-[LV]. 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 2. 
CONSENSUS: P-G-G-V-G-P-[MF]-T-(IV]. 

NAME: Oxygen oxidoreductases covalent FAD-binding site. 

CONSENSUS: P-x(10>[DE]-[LIVM].x(3)-(LIVM]-x(9)-{LIVMJ-x(3).[GSA]-[GST]-G-H. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class-I active site. 
CONSENSUS: G-G-x-C-[LIVA]-x(2)-G-C-[L.IVM]-P. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class-II active site. 
CONSENSUS: C-x(2)-C.D-{GAl-x(2,4).[FY]-x(4)-{LIVMl-x-[UVM](2).G(3)-[DN]. 

NAME: Respiratory-chain NADH dehydrogenase subunit I signature 1 . 

CONSENSUS: G-[LIVMFV r KRS]-[LIVMAGP1-Q-x-rLIVMFYl-x-D.lAGIM|.[LIVMFTAl-K-[LVMYST|. 
CONSENSUS: [LIVMFYG]-x-[fCRHEQG]. 

NAME: Respiratory -chain NADH dehydrogenase subunit 1 signature 2. 

CONSENSUS: P-F-D-[LIVMFYQ]-{STAGPVM].E-tGAC]-E-x-(EQ]-[UVMS]-x(2)-G. 

NAME: Respiratory-chain NADH dehydrogenase 20 Kd subunit signature. 

CONSENSUS: [GN]-x-D-[KRST]-[LIVMn{2)-P-{INn-D-tLIVMFYW](2)-x-P-x-C-P-[PT]. 

NAME: Respiratory-chain NADH dehydrogenase 24 Kd subunit signature. 
CONSENSUS: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2MGA]-P. 

NAME: Respiratory chain NADH dehydrogenase 30 Kd subunit signature. 

CONSENSUS: E-R.E-x(2).[DE^[LIVM^(2)-x(6)-|HK].x(3)-[KRP]-x-[UVM]-[LIVMS]. 

NAME: Respiratory chain NADH dehydrogenase 49 Kd subunit signature. 
CONSENSUS: [UVMH]-H-[RT]-[G A]-x-E-K-[LIVMT]-x-E-x-[KRQ] . 

NAME: Respiratory-chain NADH dehydrogenase 51 Kd subunit signature 1. 
CONSENSUS: G.[AM]-G-[AR]-Y-[LIVM]-C-G-[DE](2)-[STA](2)-[LIM](2)-[EN]-S. 

NAME: Respiratory-chain NADH dehydrogenase 51 Kd subunit signature 2. 
CONSENSUS: E-S-C-G-x-C-x-P-C-R-x-G. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 1. 
CONSENSUS: P-x(2)-C-[YWS]-x(7)-G-x-C-R-x-C. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 2. 
CONSENSUS: C-P-x-C-[DE]-x-[GS](2)-x-C-x-L-Q. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 3. 
CONSENSUS: R-C-[LIVM]-x-C-x-R-C-[LIVM]-x-[FY]. 

NAME: Nitrite and sulfite reductases iron-sulrur/sirobeme-binding site. 
CONSENSUS: [STV]-G-C-x(3)-C-x(6)-[DEJ-fLIVMF]-[GATJ-[LIVMFJ. 

NAME: Unease signature. 

CONSENSUS: L-x-[LV]-L-K-[ST]-T-x-S-x-F-x(2HFY]-x(4)-[FY]. 

NAME: Heme-copper oxidase catalytic subunit, copper B binding region signature. 
CONSENSUS: [YWG]-[LIVFYWTA]a)-CVGSl-H-[LNPl-x-V-x(44,47)-H-H. 

NAME: CO II and nitrous oxide reductase dinuclear copper centers signature. 
CONSENSUS: V-x-H-x<33,40>-C-x(3)-C-x(3)-H-x(2)-M. 

NAME: Cytochrome c oxidase subunit Vb, zinc binding region signature. 
CONSENSUS: [LIVM](2)-rFYW]-x(10K:-x(2><:-G^(2HFY)-K-L. 

NAME: Multicopper oxidases signature 1. 

CONSENSUS: G-x-(FYW]-x-[LIVMFY^x-(CSThx(^ 

NAME: Multicopper oxidases signature 2. 
CONSENSUS: H-C-H-x(3)-H-x(3)-[AG]-(LM] . 

NAME: Peroxidases proximal heme-ligand signature. 
CONSENSUS: [DET]-[LIV\n^x(2HUVM]-[W 

NAME: Peroxidases active site signature. 

CONSENSUS: lSGATv^-x(3>-(LIVMA^R-[UVMA]-x-lFW]-H-x-(SAC]. 
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NAME: Caialase proximal heme-ligand signature. 

CONSENSUS: R-[LlVMFSTA^-HGASTNP]-Y-x-D-[ASTMQEH]. 

NAME: Catalase proximal active site signature. 

CONSENSUS: [IF]'X-tRH]-x(4).[EQ]-R.x(2)-H-x(2)-[GAS]-[GASTF]-[GASTl. 
NAME: Glutathione peroxidases selcnocysteine active site. 

CONSENSUS: [GNHRKHNFYC)-x-[LJVMFC]-[LIVMF3(2)-x-N-(VTl-x4STC]-x<:4GA)-x-T. 

NAME: Glutathione peroxidases signature 2. 
CONSENSUS: [LIVHAGD]-F-P-[CS]-(NG]-Q-F. 

NAME: Lipoxygenases iron-binding region signature 1. 

CONSENSUS: H-[EQ]-x(3>H-x-[LMl-{NQRC].[GST)-H-[LIVMSTACK3)-E. 

NAME: Lipoxygenases iron-binding region signature 2. 

CONSENSUS: [LIVMA]-H-P-[LIVM]-x.[KRQ]-[LIVMFI(2)-x-[AP].H. 

NAME: Extradiol ring-cleavage dioxygenases signature. 

CONSENSUS: [GNTIV]-x.H-x(5,7)-tLIVMFl-Y-x(2)-[DENTA]-P-x-(GP]-x(2.3)-E. 
NAME: Intradiol ring -cleavage dioxygenases signature. 

CONSENSUS: [LIVM]-x-G-x-[LIVM]-x(4)-[GS]-x(2)-[LIVM]-x(4>[LIVM]-[DE]-[LIVMFY]- 
CONSENSUS: x(6)-G-x-[FY]. 

NAME: Indoleamine 2,3-dioxygenase signature i. 
CONSENSUS: G-G-S-lAN]-[GA]-Q-S-S-x(2)-Q. 

NAME: Indoleamine 2,3-dioxygenase signature 2. 

CONSENSUS: [FY]-L-{DQ]-[DE]-[LIVM]-x(2)-Y.M-x(3)-H-[KR]. 

NAME: Bacterial ring hydroxytaring dioxygenases alpha-subunit signature. 
CONSENSUS: C-x-H-R-lGA]-x(8>-G-N-x(5)-C-x-[FY]-H. 

NAME: Bacterial luciferase subunits signature. 

CONSENSUS: [GA]-[UVM]-P-lLIVM]-x-[LIVMFY]-x.W.x(6).[RK]-x(6)-Y-x(3>-[AR]. 

NAME: ubiH/COQo monooxygenase family signature. 
CONSENSUS: H-P-[LIVHAG]-G-Q-G-x-N-x-G-x(2)-D. 

NAME: Biopterin-dependent aromatic amino acid hydroxylases signature. 
CONSENSUS: P-D-x(2)-H-(DE]-[LI]-[LIVMFl-G-H-[LIVMCl-P. 

NAME: Copper type II. ascorbate-dependent monooxygenases signature 1. 
CONSENSUS: H-H-M-x(2)-F-x-C. 

NAME: Copper type II, ascorbate-dependent monooxygenases signature 2. 
CONSENSUS: H-x-F-x<4)-H-T-H-x(2)-G. 

NAME: Tyrosinase CuA-binding region signature. 

CONSENSUS: H-x(4,5)-F-[LIVMFTP]-x-[FWl-H-R-x(2)-[LM].x(3)-E. 

NAME: Tyrosinase and hemocyanins CuB-binding region signature. 
CONSENSUS: r>P-x-F-[LIVMFYW]-x(2)-H-x(3)-D . 

NAME: Fatly acid desaturases family 1 signature. 
CONSENSUS: G-E-x-[FY]-H-N-[FY)-H-H-x-F-P-x-D-Y. 

NAME: Fatty acid desaturases family 2 signature. 

CONSENSUS. [ST]-(SA]-x(3)-[QR)-[Ln-x(5.6)-r>Y-x(2>tLIVMinrvV].(UVM]-pEl. 

NAME: Cytochrome P450 cysteine heme-iron ligand signature. 
CONSENSUS: [FW]-[SGNH]-x-[GD]-x-[RHPT]-x-C-[LIVMFAP]-tGAD] . 

NAME: Heme oxygenase signature. 
CONSENSUS: L-L-V-A-H-A-Y-T-R. 

NAME: Copper/Zinc superoxide dismutase signature 1. 

CONSENSUS: [GA]-tIFATl-H-[LIVFl.H-x(2HGP3-[SDG]-x-[STAGDI. 

NAME: Copper/Zinc superoxide dismutase signature 2. 
CONSENSUS: G-[GNHSGA]-G-x-R-x-[SGA]-C-x(2)-[IV]. 
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NAME: Manganese and iron superoxide dismutascs signature. 
CONSENSUS: D-x-W-E-H-[STAHFYl<2). 

NAME: Ribonucleotide reductase large subunit signature. 

CONSENSUS: W-xa)-[Un-x(6J)-G-[LIVM]-[FYRA]-rNHl-x(3)-[STAQLIVM]-[ASq-x(2)- 
CONSENSUS: [PA]. 

NAME: Ribonucleotide reductase small subunit signature. 

CONSENSUS: [IVMSEQ]-E-x( 1 .2>[LIVTA]-[HYl-[GSA]-x-[STAVMl-Y-x<2V[LIVMQJ-x(3)- 
CONSENSUS: [LIFYHIVFYCSA]. 

NAME: Nitrogenases component I alpha and beta subunits signature 1. 
CONSENSUS: [LIVMFYH]-[LIVMFST3-H-{AG]-(AGSP]-[LIVMNQA]-[AG]-C. 

NAME: Nitrogenases component t alpha and beta subunits signature 2. 

CONSENSUS: [STANQ]-[ET].C-x(5).G-D-[DN]-[LIVMT]-x^STAGRJ-{LfVMFYST]. 

NAME: NifH/frxC family signature 1 . 

CONSENSUS: E-x-G-G-P-x(2)-[GA]-x-G-C-[AGJ-G. 

NAME: NifH/frxC family signature 2. 
CONSENSUS: D-x-L-G-D-V-V-C-G-G«F-(AG]-x-P. 

NAME: Nickel-dependent hydrogenases large subunit signature 1 . 
CONSENSUS: R-G-[LIVMF]-E-x(15).[QESM]-R-x-C-G-[LIVM]-C. 

NAME: Nickel-dependent hydrogenases large subunit signature 2. 
CONSENSUS: lFY]-D-P-C-[UMHASG]-C-x(2,3)-H. 

NAME: Glutamyl-tRNA reductase signature. 

CONSENSUS: H-[LIVMl-x(2)-[UVM].[GSTACl(3)-[UVM]-[DE<M.S-tLIVMA].[LIVMl(2HGF].E. 
CONSENSUS: x-[QR]-[IV].[Lm[STAG}-Q-[LIVM]-[KR] . 

NAME: Bacterial-type pbytoene dehydrogenase signature. 

CONSENSUS: [NG]-x-[FYWV]-[LIVMn^-G-[AGC]-[GS]-[TA]-[HQTl-P-G-[STAV]-G-[LIVM]- 
CONSENSUS: x(5)-(GS]. 

NAME: Glycine radical signature. 

CONSENSUS: f STIV]-x-R-(IVTHCS A]-G- Y-x-[GACV] . 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature I. 
CONSENSUS: G-x£)-[LIVM]-Y-D-x-[FY]-x-G-x(2)-L-N-P-R. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 2. 
CONSENSUS: [LIVM](2)-H-R-x<2>R-D-x(3)-C-x(2)-K-Y-G. 

NAME: NNMT/PNMT/TEMT family of me thy I transferases signature. 
CONSENSUS: L-I-D-I-G-S-G-P-T-[IV]-Y-Q-L-L-S-A-C, 

NAME: RNA methyltransferase trmA family signature 1 . 

CONSENSUS: {DN]-P-[PAJ-R-x-G-x(14,l6)-[lJVM](2)-Y-x-S-C-N-x(2)-T. 

NAME: RNA methyltransferase trmA family signature 2. 
CONSENSUS: [LIVMF]-D-x-F-P-[QHYl-[ST]-x-H-lLrVMFYl-E. 

NAME: Thymidylate synthase active site. 

CONSENSUS: R-x(2)-[UVM]-x(3MFWHQW-x(8,9M^ 

CONSENSUS: x-{LV]. 

NAME: Ribosomal RNA adenine dime thy lases signature. 
CONSENSUS: [imfl-[LIvWY>tDE]-x-G-[STAPV]-G-x-[GAM 
CONSENSUS: x(6)-[LIVMY]-x-{STAGVJ-lLIVMFYHC]-E-x-D. 

NAME: Methylated-DNA-protein-cystetne methyltransferase active site. 
CONSENSUS: [L1YMFJ-P-C-H-R-{LIVMF](2). 

NAME: N-6 Ade nine-specific DNA methylases signature. 
CONSENSUS: [LIVMAC]-lLIVFYWA|-x-[DN]-P-P-[FYW). 

NAME: N-4 cytosine- specific DNA methylases signature. 
CONSENSUS: [LIVMF]-T-S-P-P-(FY] . 

NAME: C-5 cytosine -specific DNA methylases active site. 

CONSENSUS: [DENKS]-x-{FLIVl-x(2HGSTC]-x-P-C-x(2)-[FYWLIM)-S. 
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NAME: C-5 cytosine-specific DNA mcthy lascs C-terminal signature. 

CONSENSUS: (RKQGTF]-A(2)^-N-[STAG]-[LIVMF]-x(3HLrVMTl-x(3V(LrVMJ.x(3)-[LIVM^ 

NAME: Protein-L-isoaspartate(D-aspariate) O-methyltransferase signature. 
CONSENSUS: (GSA].D-G-x(2)^.[FYWV]-x(3)-[AS]-P-fFY]-[DN]-x-I. 

NAME: Uroporphyrin-m C-methyltransferase signature I. 
CONSENSUS: [LIVMMGSHSTAL]-G-P-G-x(3MLIVMFY]-[l^ 

NAME: Uroporphyrin-in C-methyltransferase signature 2. 

CONSENSUS: V-x(2)-[LI]-x(2)-G-D-x(3)-[FYW].[GS]-x(8).[UVFl-x(5,6)-lLIVMFYWPAC]- 
CONSENSUS: x-lUVMYJ-x-P-G. 

NAME: ubiE/COQ5 methyltransferase family signature I. 
CONSENSUS: Y-D-x-M-N-x(2MLIVM]-S-x(3)-H-x(2)-W. 

NAME: ubiE/COQ5 methyltransferase family signature 2. 

CONSENSUS: R-V.[LIVM]-K.[PV].G-G-x-[UVMFJ-x(2)-[LIVMl-E-x-S. 

NAME: Serine hydroxymethyltransferase pyridoxal-phosphate attachment site. 

CONSENSUS: . lDEHl-[LIVMFY]-x-[STMV]-[GST]-[ST](2)-H-K-[ST]-[LF].x-G-n , AC]-[RQJ- 

CONSENSUS: (GSA]-[GA). 

NAME: Phosphoribosylglycinamide formy I transferase active site. 

CONSENSUS: G-x-[STM]^WT]"X-{FYWVQ3.lVMAT]-x-{DEVM]-x-[LIVMY]-D-x-G-x(2)-(LrAT]- 
CONSENSUS: x(6)-[LIVM]. 

NAME: Aspartate and ornithine carbamoyl transferases signature. 
CONSENSUS: F-x-[EK]-x-S-[GT)-R-T. 

NAME: Transketolasc signature 1 . 

CONSENSUS: R-x(3)-[LIVMTA)-[DENQSTHKF].x(5,6HGSN]-G-H-fPLIVMFl-[GSTA]-x(2). 
CONSENSUS: [UMCHGS]. 

NAME: Transketolasc signature 2. 

CONSENSUS: G-(DEQGSA]-lDN]-G-(PAEQJ-lSTl-[HQl-x-[PAGM]-tLrVMYACl-[DEFYW]-A(2)- 
CONSENSUS: [STAP]-x(2)-[RGA]. 

NAME: Transatdolase signature 1. 

CONSENSUS: [DG]-[IVSA]-T.[ST]-N-P-[STAJ-[LIVMF1(2). 
NAME: Transaldolase active site. 

CONSENSUS: [UVM>x-[UVM]-K-[LIVM]-[PAS|-M^ 
CONSENSUS: [QEKRST]-x-[LIVM]. 

NAME: Acyltransferases ChoActase / COT / CPT family signature I . 

CONSENSUS: [Ln-P-x-[LVP].P-[IVTA]-P-x-(LIVM]-x.[DENQAS]-{ST]-[LIVM]-x(2)-(LY]. 
NAME: Acyltransferases ChoActase / COT / CPT family signature 2. 

CONSENSUS: R-(FYW]-x-[DA]-[KA].x(0.1)-[LIVMFY]-x-[LIVMFY](2)-x(3)-[DNS]-[GSA]-x(6)- 
CONSENSUS: [DE]-[HS]-x(3HDE).[GA]. 

NAME: Thiolascs acyl -enzyme intermediate signature. 

CONSENSUS: [LIVNn4NSTJ-x{2K-lSAGU3-[ST)-[SAGl-[LIVMFYNS]-x.{STAG]-[LIVM]-x(6> 
CONSENSUS: [LIVM]. 

NAME: Thiolascs signature 2. 

CONSENSUS: N-x(2H3-G-x-[LIVM]-[SAJ-x-G-H-P-x-G-x-[ST3-G. 
NAME: Thiolases active site. 

CONSENSUS: [AG]-{LIVMA).[STAGLIVMl-[STAG]-[UVMA]-C-x-[AG]-x.[AG]-x-[AG]-x.[SAG]. 

NAME: Chloramphenicol acecyltransferase active site. 
CONSENSUS: Q-[LIV)-H-H-(SAJ-x(2)-r>G-[FY]-H. 

NAME: Hexapeptide-repeat containing-rransfc rases signature. 

CONSENSUS: [UV].[GAED]-x(2)-[STAV]-x-(LIV]-x(3)-[UVAC]-x-[LIV|-[GAED]-x(2)- 
CONSENSUS: (STAVR].x-[LIV)-[GAEDhxa>-(STAV)-x-{UV]-x(3HLIVl. 

NAME: Beta-ketoacyl synthases active site. 

CONSENSUS: G-x(4)-[UVMFAP]-x(2HAGCK-{STA](2)-[STAG]-x(3)-[LIVMFl- 
NAME: Chalcone and sulbenc synthases active site. 
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CONSENSUS: R-[LIVMFi'S]-x-[LiVM]-x-[QHG]^^^ 
CONSENSUS: [RA]. 

NAME: Myristoyl-CoA: protein N-myristoyltransfcrase signature I. 
CONSENSUS : E-I-N-FL-C-x-H-K. 

NAME: Myristoyl-CoA:protein N-myristoyltransferase signature 2. 
CONSENSUS: K-F-G-x-G-D-G. 

NAME: Gamn\a-glutamyltranspeptidase signature. 

CONSENSUS: T-[STA]-H-x-[STl-[LIVMAJ-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[LlVM].[NE]- 
CONSENSUS : x( 1 , 2 MFY]-G. 

NAME: Transglutaminases active site. 

CONSENSUS: [GT]-Q-[CAJ-W-V-x-[SA]-[GA)-[IVT]-x(2)-T-x-[LMSC]-R-[CSA]-[LVl-G. 

NAME: Phosphorylase pyridoxal-phosphate anachment site. 
CONSENSUS: E-A-[SCl-G-x-[GS]-x-M-K-x(2MLM]-N. 

NAME: UDP-glycosyltransferascs signature. 

CONSENSUS: [FW]-x<2)-(^x(2)-[LIVMYAh[LIMV]-A(4,6HL^^ 

CONSENSUS: [HNQ]-[STAGC]-G-x(2V(STAG]-x(3HSTAGLl-[LIVMFAJ-x(4)-[PQR]-[LIVMT]. 
CONSENSUS: x(3)-[PA]-x(3)-[DES]-[QEHN]. 

NAME: Purinc/pyrimidine phosphoribosyl transferases signature. 

CONSENSUS: [LIVMFYWCTA]-[UVM]-[LiVMA]-[LIVMFCJ-lDE]-D-[LiVMS3-[LIVM]-[STAVD]- 
CONSENSUS: [STAR]«[GAC]-x-[STAR]. 

NAME: Glutamine amidotransfe rases class-I active site. 
CONSENSUS: [PAS]-[UVMFiT]-[LIVMFY]^[LI^^ 

NAME: Glutamine amidotransfcrases class-H active site. 
CONSENSUS: <x(0,llK-[GS]-[IV].[UVMFYW].[AG3. 

NAME: Purine and other phosphorylases family 1 signature. 
CONSENSUS: [GST]-x-G-[LIVM)-G-x-[PA]-S-x-[GSTAJ-I-x(3)-E-L. 

NAME: Purine and other phosphorylases family 2 signature. 

CONSENSUS: [UV]-x(3)-G-x(2)-H-x-[lJVMFYl-x(4V[U\^n-x(3)-[ATVl-x(l,2)-CLIVM]-x- 
CONSENSUS: [ATV].x(4)-[GN]-x(3,4)-[LrVMFl(2>x(2)-lSTNJ-[SAJ-x-G-tGS]-[LIVMl. 

NAME: Thymidine and pyrimidine-nucleoside phosphorylases signature. 
CONSENSUS: S-[GS]-R-[GA]-[UV]-x(2)-[TAl-[GAl-G.T-x-D-x-(LIVl-E. 

NAME: ATP phosphoribosyl transferase signature. 

CONSENSUS: E-x(5)-G-x-[SAG]-x(2)-{IVl-x-D-[LlVJ-x(2)-[STl-G-x.T-[LMl. 

NAME: NAD:arginine ADP-ribosyltransf erases signature. 
CONSENSUS: [FY>x-[FY]-K-x(2)-H-[FY]-x-L-[ST]-x-A. 

NAME: Prolipoprotein diacylglyceryl transferase signature. 
CONSENSUS: G-R-x-[GA>N-F«[UVMF]-N-x-E.x(2)-G. 

NAME: S-adenosylmethionine synthetase signature 1. 
CONSENSUS: G-A-G-D-Q-G-x(3)-G-Y. 

NAME: S-adenosylmethionine synthetase signature 2. 
.CONSENSUS: G-{GA]-G-[ASC]-F-S-x-K-[DE]. 

NAME: Polyprcnyl synthetases signature 1. 

CONSENSUS: [LIVM]a)-x-D-D-x(2,4)-D-x(4)-R.R-[GH]. 

NAME: Polyprenyl synthetases signature 2. 

CONSENSUS: (UVMFi1^x(2)-[Fn-]-Q-[LIVM]-x-D-D-[LIVMFYl.x.[DNG]. 
NAME: Squalene and phytoene synthases signature 1 . 

CONSENSUS: Y4CSANfl-x(2)^SG]-A-{GSA]-(UVAT]-aV]^-xa)^SCl-x(2>-fU^ . 

NAME: Squalene and phytoene synthases signature 2. 
CONSENSUS: [UVM)^-x(3>^-x<2.3)-N-[IFJ-x-R-D^^ 
CONSENSUS: x-P. 

NAME: Protein prenyl transferases alpha subunit repeat signature. 
CONSENSUS: p>SttVJ-x-[>n>FV]-[NEQIY] -x-0-im 
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NAME: Riboflavin synthase alpha chain family signature. 

CONSENSUS: [LIVMFl-x(5)-G-[STADNQ3-[KREQIYW]-V-N-[UVM]-E. 

NAME: Dihydropteroatc synthase signature 1. 

CONSENSUS: (LIVM]-x-[AGJ-[LIVMFl(2)-N-x-T-x-D-S-F-x-D-x-[SG]. 
NAME: Dihydropteroate synthase signature 2. 

CONSENSUS: [GE]-(SA)-x.[LIVM](2)-D-[LrVfM]<i-[GPl-x(2HSTA]-x.P. 
NAME: EPSP synthase signature 1. 

CONSENSUS: [LrVM]-x(2MGN]-N-[SA]-G-T-[STA]-x-R-x-{UVMYl-x-[GSTA). 
NAME: EPSP synthase signature 2. 

CONSENSUS: (KR]-x-{KH]-E-[CST]-[DNE]-R-lLIVM]-x-[STA].[LIVMC]-x(2>[EN)-[LIVMF]-x- 
CONSENSUS: [KRA)-[LIVMF]-G. 

NAME: FLAP/GST2/LTC4S family signature. 
CONSENSUS: G-x(3)-F-E-R-V-[FY]-x-A-[NQ]-x-N-C. 

NAME: Aminotransferases class-I pyridoxal-phosphate attachment site. 

CONSENSUS: [GS]-fUVMFYTAC]-tGSTA]-K-x(2HGSALV^^-[LIVMFA]-x-[GNAR^x-R-[LIVMA^ 
CONSENSUS: [GAJ. 

NAME: Aminotransferases class-II pyridoxal-phosphate attachment site. 

CONSENSUS: T-[LIVMFYWl-[STAG]-K-[SAG]-(LIVMFYWR]-[SAG]-x(2)-[SAG]. 

NAME: Aminotransferases class-!ll pyridoxal-phosphate attachment site, 
CONSENSUS: [LIVMFYWq<2)-x-D-E-(UVMAJ-x(2MGP] 

CONSENSUS: [GSAD]-x(12,l6)-D.[Lrv'MFYWC]-x(2 I 3)-tGSA3-K-x(3)-[GSTADNl-[GSA]. 
NAME: Aminotransferases class- IV signature. 

CONSENSUS: E-x-[STAGCn-x(2)-N-lLIVMFAC]-[FYl-x(6,12)-CLIVMFl-x-T-x(6 t 8)-(LIVMJ-x- 
CONSENSUS: [GSMLIVM]-x-[KR]. 

NAME: Aminotransferases class-V pyridoxal-phosphate attachment site. 

CONSENSUS: [UVFYCHT]-[DGHl-[LIVMFYAq-CLIVMFYA]-x(2>-[GSTAC]-[GSTA]-[HQR]-K- 
CONSENSUS: x(4,6)-G-x-[GSAT|-x-[LIVMFYSAC]. 

NAME: Hexokinases signature. 

CONSENSUS: [LIVM]-G-F-[TNl-F^-[I^]-P-x(5)-[LIVM]-[DNSTl-x(3)-[LIVM]-x(2)-W-T-K.x- 
CONSENSUS: (LF). 

NAME: Gaiacto kinase signature. 

CONSENSUS: G-R-x-N-[LIV]-I-G-E-H«x-D-Y. 

NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: (UVM]-fPK]-x-[GSTAl-x(0,l)-G-L-[GS]-S-S-[GSA3-[GSTAC]. 

NAME: Phosphofructokinasc signature. 

CONSENSUS: (RK]-x(4)-G-H-x-Q-[QR]-G-G-x(5)-D«R. 

NAME: pfkB family of carbohydrate kinases signature 1. 
CONSENSUS: [AG]-G-x(0,l>-[GAP]-x-N-x-[STA]-x(6)-[GSl-x(9)-G. 

NAME: pfkB family of carbohydrate kinases signature 2. 

CONSENSUS: [DNSKl-[PSTV]-x-[SAG|(2)-[GDJ-D-x(3)-ISAGVl-[AGl-[LIVMFYl-[LIVMSTAP]. 
NAME: ROK family signature. 

CONSENSUS: fLlVMhxa>^-[LIVMFC^-G-x-[GA]-[UVMFA]-x(8)-G-x(3,5)-[GATPJ-x(2)- 
CONSENSUS: G-(RKH]. 

NAME: Phosphoribuiokinase signature. 

CONSENSUS: K-[LIVM]-x-R-D-x(3)-R-G-x-[ST]-x«E. 

NAME: Thymidine kinase cellular-type signature. 

CONSENSUS: [GA]-x(l,2)-(DE]-x-Y-x-[STAP]-x<:-[NKR]-x-[Crn-[UVMFYWH]. 

NAME: FGGY family of carbohydrate kinases signature 1 . 
CONSENSUS: rMF*Wx-[PST|-xa)-K-{LIVMFYW]-x-W-^ 

NAME: FGGY family of carbohydrate kinases signature 2. 
CONSENSUS: [GSA)-x-[UVMFYW].x^-[LIVM>x<7,8>-[^ 
CONSENSUS: fUVMFY]-|DEQl. 



1044 



WO 01/12659 



PCT/IB00/01496 



NAME: Protein kinases ATP-binding region signature. 

CONSENSUS: [LIVl<l.{P}^-{P}.[r^MGSTNHl-[SGA3-{PW}-[LIVCAT]-{PD}-x-[GSTACLIVMFY]- 
CONSENSUS: x(5. 18>[LIVMFYWCSTAR].[AIVP]-[LIVMFAGCKR1-K. 

NAME: Serine/Threonine protein kinases active-site signature. 

CONSENSUS: [LIVMFYC3-x.[HY]-x-D-[UVMFYl-K-x(2)-N-[LIVMFYCT3(3). 

NAME: Tyrosine protein kinases specific active-site signature. 

CONSENSUS: [LIVMFYC3-x-[HY]-x-r>[UVMFY]-[RSTAC}-x(2)-N-[LIVMFYC](3). 

NAME: Protein kinase domain profile. 

NAME: Casein kinase II regulatory subunit signature. 

CONSENSUS: C-P-x-[LIVMY]-x-C-x(5)-L-P-tLIVMC]-G-x(9)-V-[KR]-x(2K:-P-x-C. 
NAME: Pyruvate kinase active site signature. 

CONSENSUS: [LIVAC]-x-rmM]aHSAPCV]-K-lLIV]-E-[NKRST]-x-IDEQHJ-[GSTA]-[LIVM]. 
NAME: Shikimate kinase signature. 

CONSENSUS: [KR]-x(2)-E-x(3)-[LIVMF)-x(8, 12HLIVMF](2HSA]-x-G(3)-x-[LIVMF| . 

NAME: Prokaryotic diacylglycerol kinase signature. 
CONSENSUS: E-x-[LIVMl-N-[ST]-[SA}-{LIV}-E-x(2)-V-D. 

NAME: Phosphatidyl inositol 3- and 4-kinases signature 1. 

CONSENSUS: [LIVMFAC3-K-x(l,3HDEA]-[DE]-[LIVMC]-R-Q-[DE]-x(4)-Q. 
NAME: Phosphatidyltnositol 3- and 4-kinases signature 2. 

CONSENSUS: [GS]-x-[AV]-xC3)-[UVM]-x(2)-IrnrH]-[LIVMK2)-x-[LIVMF)-x-D-R-H-x(2)-N. 

NAME: Acetate and bury rate kinases family signature 1 . 
CONSENSUS: lUVM](2)-x-[LIVM]-N-x-G-S-LSTJ-S-x-{KE]. 

NAME: Acetate and butyrate kinases family signature 2. 

CONSENSUS: [UVMAJ(2)-x(2)-H-x-G-x-G-x-[STHLIVMJ-x-[AV]-x<3)-G. 

NAME: Phosphogly cerate kinase signature. 

CONSENSUS: [KRHGTCVl-[VT]-[LIVMF]-[LIVMC]-R-x-D-x-N-[SACV}-P. 
NAME: Aspartokinase signature. 

CONSENSUS: [UVMl-x-K-[FY]-G-G-[ST]-[SC]-[LIVMl. 
NAME: Glutamate S-kinase signature. 

CONSENSUS: [GSTNl-x(2)-G-x-G-[GCl-[IM]-x-[STA]-K-[LIVM]-x-[SA]-[TCA]-x(2)-[GALV]- 
CONSENSUS: x(3)-G. 

NAME: ATP:guanido phosphotransferases active site. 
CONSENSUS: C-P-x(0, 1MSTJ-N-IILJ-G-T. 

NAME: PTS HPR component histidine phosphorylation site signature. 
CONSENSUS: G-[LIVM]-H-tSTA]-R-[PA]-[GSTA]-[STAM]. 

NAME: PTS HPR component serine phosphorylation site signature. 

CONSENSUS: [GSADE)-[iaiEQTV]-x(4)-[KR^-S-[LIVM^(2>-x-[LIVM]-x(2)-[LIVM]-[G 

NAME: PTS EIIA domains phosphorylation site signature 1. 
CONSENSUS: G-x(2)-[UVMF](3).H-[LlVMF]-G-[UVMF)-x-T-[AI-V]. 

NAME: PTS EIIA domains phosphorylation site signature 2. 

CONSENSUS: [DENQ]-x(6)-[LIVMFl-[GA]-x(2>[UVM]-A-[LIVM)-P-H-[GAC]. 

NAME: PTS EIIB domains cysteine phosphorylation site signature. 

CONSENSUS: N-[LIVMinn-x(5)-C-x-T-R-[lJVMin-x-[UVMFl-x-{LIVM3-x-[DQ]. 

NAME: Adenylate kinase signature. 

CONSENSUS: [UVMFYW](3)-IW}-[FYI]-P-R-x(3)-(NQ]. 

NAME: Nucleoside diphosphate kinases active site. 
CONSENSUS: N-x(2)-H-[GA]-S-D-{SA]-[LIVMPKNE] . 

NAME: Guanylate kinase signature. 

CONSENSUS: T-{STJ-R-x(2)-[KR]-x<2)-fDEl-x(2)-G-x(2).Y-x-{FYl-[LIVMK]. 
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NAME: Guanylatc kinase domain profile. 

NAME: Phosphoribosyl pyrophosphate synthetase signature. 

CONSENSUS: D-[LI]-H.[SA].x-Q-[lMSTJ.[QM]-G-tFY]-F-x(2)-P-[LIVMFC]-D. 

NAME: 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature. 
CONSENSUS: G-[PE]-R-x(2)-D-L-D-[LIVM](2). 

NAME: Bacteriophage-type RNA polymerase family active site signature 1. 
CONSENSUS: P-[LTVMl-x(2)-D-(GA]-{ST]-[AC3-[SN]-(GA]-[UVMFY]-Q. 

NAME: Bacteriophage-type RNA polymerase family active site signature 2. 
CONSENSUS: [LIVMF]-x-R-x(3)-K-x(2)-[LIVMFl-M-(PT]-x(2)-Y. 

NAME: Eukaryotic RNA polymerase II heptapeptide repeat. 
CONSENSUS: Y-{ST]-P-[ST]-S-P- [STANK]. 

NAME: RNA polymerases beta chain signature. 

CONSENSUS: G-x-K.[LrVMFA>[STAC]-[GSTN]-x-[HSTA]-[GS]-[QNH].K*G.[IVT]. 
NAME: RNA polymerases M / 15 Kd subunits signature. 

CONSENSUS: F-C-x-[DEKST]-C.[GNK]-[DNSA}-[LIVMH]-{LIVM]-x(8.14)-C-x(2)-C. 
NAME: RNA polymerases D / 30 to 40 Kd subunits signature. 

CONSENSUS: N-[SGA]-[UVMF]-R-R-x(9>-[SA]-x(3)-V.x(4)-N-x-[STA]-x(3)-[DNl-E-x-[LI]. 
CONSENSUS: [GA]-x-R-[Ll]-[GAMLIVM](2)-P. 

NAME: RNA polymerases H / 23 Kd subunits signature. 
CONSENSUS: H-[NEI]-[LIVM]-V-P-x-H-x<2MUVM]-x(2)-[DE]. 

NAME: RNA polymerases K / 14 to 18 Kd subunits signature. 
CONSENSUS: [ST]-x-IFY]-E-x.[AT).R-x-[LIVM]-[GSA]-x-R-[SA]-x-Q. 

NAME: RNA polymerases L / 13 to 16 Kd subunits signature. 

CONSENSUS: [DE](2)-H-[ST]-[UVMHGAP]-N-x(l l)-V-x-[FM]-x(2)-Y-x(3)-H-P. 

NAME: RNA polymerases N / 8 Kd subunits signature. 
CONSENSUS: [LIVMFl(2)-P-[LIVM]-x-C-F.[ST]-C.G. 

NAME: DNA polymerase family A signature. 

CONSENSUS: R-x(2)-[GSAV]-K-x(3)-[LIVMFYl-[AGQ}-x(2)-Y-x{2)-[GSl-x(3MLIVMA]. 
NAME: DNA polymerase family B signature. 

CONSENSUS: [YA]-[GLIVMSTAC]-D-T-D-[SG]-[LIVMFTC]-x-ILIVMSTAC]. 
NAME: DNA polymerase family X signature. 

CONSENSUS: G-[SG]-[LFY]-x-R.[GEl-x(3>-lSGCL]-x-D-[LIVMJ-D-[LIVMFY](3)-x(2)-[SAPJ. 

NAME: Galactose- 1 -phosphate uridyl transferase family 1 active site signature. 
CONSENSUS: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q. 

NAME: Galactose- 1 -phosphate uridyl transferase family 2 signature. 
CONSENSUS: D-L-P-I-V-G-G-tST)-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-Q-G-G. 

NAME: ADP-glucose pyrophosphorylase signature 1. 

CONSENSUS: [AG]-G-G-x-G-{STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LVl. 

NAME: ADP-glucose pyrophosphorylase signature 2. 
CONSENSUS: W-[FY]-x-G-[St]-A-[DNSH>[AS]-[UVMFYW]. 

NAME: ADP-glucose pyrophosphorylase signature 3. 

CONSENSUS: tAPV3-[GS3-M-G-[LIVMN]-Y-[IVC]-[LIVMFY]-x(2)-[DENPHK). 
NAME: Phosphatidate cytidylyttransferase signature. 

CONSENSUS: S-x-[UVMF]-K-R-x(4)-K-D-x-[GSA]-x(2)-{U]-[PG]-x-H-G-G-[LIVMl-x-D-R- 
CONSENSUS: [UVMFTJ-D. 

NAME: Ribo nuclease PH signature. 

CONSENSUS: C-IDE]-{UVM](2)-Q-{GTA]-D<HSG}-x(2>-[TA]-A. 

NAME: 2' -5' -oligoadenylate symhetases signature 1. 

CONSENSUS: G-G-S-x-[AG]-[KR]-x-T-x-L-[KRHGST3-x-S-D-[AGJ. 

NAME: 2 , -5 , -oligoadenylate synthetases signature 2. 
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CONSENSUS: R-P.V-I-L-D-P-x-[DE]-P-T. 

NAME: CDP-alcohol phosphatidyltransfe rases signature. 
CONSENSUS: r>G-x(2)-A-R-x(8>G-x(3)-D-x<3)-D. 

NAME: PEP-utUizing enzymes phosphorylation site signature. 

CONSENSUS: G-[GA]-x-[TN]-x-H-[STA]-[STAV]-[LIVM](2)-[STAVl-[RG]. 

NAME: PEP-utilizing enzymes signature 2. 

CONSENSUS: [DE<^]-x-[LIVMn-S-[LrVMF]-G-[STl-N-D-[LIVMJ-x-Q.[LIVMFYGT]-[STAUV]. 
CONSENSUS: [LIVMFHGAS]-x<2>-R. 

NAME: Rhodanese signature 1. 

CONSENSUS: |Fr>x(3)*[LIV]-P-G-A.x<2)-[LIVF). 
NAME: Rhodanese C-terminal signature. 

CONSENSUS: [AV]-x(2)-[FY].[DEAP]-G-[GSA)-[WFJ-x-E[FYW]. 
NAME: CoA transferases signature 1. 

CONSENSUS: [DNMGN]-x(2)-[LIVMFA](3)-G«G-F-x(3)-G-x-P. 

NAME: CoA transferases signature 2. 

CONSENSUS: rLF]-[HQ]-S-E-N-G-[LIVF](2MG A] . 

NAME: Phospholipase A2 histidine active site. 
CONSENSUS: C-C-x(2>H«x(2)-C. 

NAME: Phospholipase A2 aspartic acid active site. 
CONSENSUS: [LIVMA]-C-{UVMFYWPCST}-C-D-x(5K. 

NAME: Lipases, serine active site. 

CONSENSUS: [LIV].x-[LIVFTl-[LIVMSTl.G.rHYWV]-S-x-G.[GSTAC]. 

NAME: Colipase signature. 
CONSENSUS: Y-x(2)-Y-Y-x-C-x-C. 

NAME: Lipolytic enzymes "G-D-S-L" family, serine active site. 
CONSENSUS: [UVMFYAG](4)-G-D-S-[LIVM]-x(l ,2)-[TAG]-G. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative histidine active site. 
CONSENSUS: [LIVMr1(2)-x-{LIVMF].H-G-G4SAG]-[Fn-x(3)-[STDN].x(2)-[ST]-H. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative serine active site. 
CONSENSUS: [LIVM].x.[LIVMFl-[SA]-G-D-S-[CA].G4GA]-x-L-[CA]. 

NAME: Carboxyleste rases rype-B serine active site. 

CONSENSUS: F-[GR]^-x(4)-[LIVM]-x-[LIV]-x-G-x-S-[STAG]-G. 

NAME: Carboxylesterases type-B signature 2. 

CONSENSUS: [ED]-D-C-L-[YTl.rUVl.[DNSl.[LIV]-tUVFYWl-x-[PQR). 
NAME: Pec tineste rase signature 1. 

CONSENSUS: [GSTW-x(5)-[UVM]-x-[LrVM]-x(2)-G-x^ 

NAME: Pectinestcrase signature 2. 
CONSENSUS: G-ISTAD1-[UVMT]-D-F-I-F-G. 

NAME: Pepddyl-tRNA hydrolase signature 1. 

CONSENSUS: [FY]-x(2)-T-R-H-N-x-G-x(2)-[LIVMFA](2HDE]. 

NAME: Peptidyl-tRNA hydrolase signature 2. 

CONSENSUS: [GSl-x(3)-H-N-G-[LIVM]-[KR]-{DNS]-[LIVMT]. 

NAME: Alkaline phosphatase active site. 

CONSENSUS: [IV]-x-D-S-[GAS)-(GASq-[GAST]-[GAl-T. 

NAME: Histidine acid phosphatases phosphohistidine signature. 

CONSENSUS: [UVM)-x(2HLIVMA]-x(2)-(UVMl-x-R-H-{GNJ-x.R.x-[PAS]. 

NAME: Histidine acid phosphatases active site signature. 

CONSENSUS: [UVMFl-x-{UVMFAG]-x(2HSTAGI]-H-r>lSTANQl-x-IlJ^]-x^ 
CONSENSUS: [STA]. 

NAME: Class A bacterial acid phosphatases signature. 
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CONSENSUS: G-S-Y-P-S-G-H-T. 
NAME: 5' -nucleotidase signature 1. 

CONSENSUS: (LrVM]-v[LIVM](2)-(HEA]-tTI].x-D-x-H-(GSA]-x-[LIVMF3. 

NAME: 5 '-nucleotidase signature 2. 

CONSENSUS: [FYP]-x<4)-[LIVM]-G-N-H-E-F-[DN]. 

NAME: Fructose- 1-6-bisphosphatase active site. 

CONSENSUS: [AG]-[RK]-L-x(l,2)-[LrV]4FYl-E-x(2)-P-[LIVM]-[GSA3. 

NAME: Serine/threonine specific protein phosphatases signature. 
CONSENSUS: ILIVMJ-R-G-N-H-E. 

NAME: PTOtein phosphatase 2A regulatory subunit PR55 signature 1 . 
CONSENSUS: E-F-D-Y-L-K-S-L-E-l-E-E-K-I-N. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 2. 
CONSENSUS: N-[AG]-H-(TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D. 

NAME: Protein phosphatase 2C signature. 

CONSENSUS: [LIVMFY]-[LIVMFYA]-[GSAC]-[UVM]-rFYC]-D-G-H-[GAV]. 

NAME: Tyrosine specific protein phosphatases active site. 

CONSENSUS: [UVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LIVMFY]. 

NAME: Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile. 

NAME: PTP type protein phosphatase profile. 

NAME: Inositol monophosphatase family signature 1. 
CONSENSUS: [FWV]-x(OJ)-[lJrV^-I>P-[LIVM]-D-^ 

NAME: Inositol monophosphatase family signature 2. 

CONSENSUS: [WV]-D-x-[Aq-[GSA]-[GSAPVJ-x-[LIVACPl-[LIV]-[LIVAC]-x(3)-[GH]-[GAI. 

NAME: Prokaryotic zinc-dependent phospholipase C signature. 
CONSENSUS: H-Y-x-[GT]-D-[LIVM]-[DNSJ-x-P-x-H-[PA]-x-N. 

NAME: Phosphatidylinositol-specific phospholipase X-box domain profile. 

NAME: Phosphatidylinositol-specific phospholipase Y-box domain profile. 

NAME: 3' 5 '-cyclic nucleotide phosphodiesterases signature. 
CONSENSUS: H-D-lLIVMFYl-x-H-x-[AGl-x(2)-CNQ]-x-[LIVMFYl. 

NAME: cAMP phosphodiesterases class-II signature. 

CONSENSUS: H-x-H.L-D-H-[LIVM3-x-[GS]-[UVMA]-(UVMl(2)-x-S-[AP]. 
NAME: Sulratases signature 1. 

CONSENSUS: (SAPl-[LIVMSn-lCSJ-|STAC]-P-[STA]-R-x(2)-lLIVMFW)(2)-[TR]-G. 
NAME: Sulfatases signature 2. 

CONSENSUS: G-(YV]-x-[STl-x(2)-[IVA]-G-K-x(0,l)-[FYWK]-[HLl. 

NAME: AP endonucleases family 1 signature i. 
CONSENSUS: [APF)-D-[LIVMF](2)-x-lLIVM)-Q-E-x-K. 

NAME: AP endonucleases family 1 signature 2. 

CONSENSUS: D-tST}-[FY]-R-[KH]-x(7.8)-[FYW]-[STl-[FYW](2). 

NAME: AP endonucleases family 1 signature 3. 

CONSENSUS: N-x-G-x-R-[UVM]-D-[UVMFYH]-x-[LV]-x-S. 

NAME: AP endonucleases family 2 signature 1 . 
CONSENSUS: H-x(2)-Y-[UVMF]-(IM]-N-[LIVMCA]-(AG]. 

NAME: AP endonucleases family 2 signature 2. 
CONSENSUS: [GRRLIVMF]-C-[LIVM]-D-T-C-H. 

NAME: AP endonucleases family 2 signature 3. 

CONSENSUS: [UVMW]-H-x-N-PEHSA]-K-x(3)-G-(SA]-x(2)-D. 
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NAME: Deoxyribo nuclease I signature 1. 

CONSENSUS: [UVM](2)-[AP]-L-H-[STA](2)-P-x(5)-E-[LIVM]-[DN]-x-L.x-[DEl-V. 

NAME: Deoxyribonudease I signature 2. 
CONSENSUS: G-D-F-N-A-x-C-[SA]. 

NAME: Endonuclease III iron-sulfur binding region signature. 
CONSENSUS: C-x(3)-[KRS]-P-[KRAGLK-x<2)-C-x<5>-C. 

NAME: Endonuclease III family signature. 

CONSENSUS: IGSTl-x[LIVMFl-P^(5HLIVMWl.x(2,3)-[LIl-[PAS]-G-V-tGA]-x(3)-ICAC]- 
CONSENSUS: x(3).[LIVM].x(2>-[SALV]-[LIVMFYW]-tGANK]. 

NAME: Ribonuclease II family signature. 

CONSENSUS: [HIMFYEHGSTAMHUVM}-x(4,5)-Y-[STAL]-x-^ 
CONSENSUS: [RQ]-[KR]-[FY]-x-D-x(3)-[HQ]. 

NAME: Ribonuclease III family signature. 

CONSENSUS: [DEQ]-IRQ]-[LM]-E-[FYW]-[LV].G-D-[SAR]. 

NAME: Bacterial Ribonuclease P protein component signature. 

CONSENSUS: ILIVMFYS3-x(2).A-x(2)-R-(NHl-[KRQL]-[LIVMl.(KRA]-R.x.[LIVMTA].[KR]. 

NAME: Ribonuclease T2 family histidine active site 1. 
CONSENSUS: [FrWL]-x-[LIVM]-H-G-L-W-P. 

NAME: Ribonuclease T2 family histidine active site 2. 

CONSENSUS: [LIVMFl-x(2)-[HDGTY]-[EQ]-[FYWl.x-[KRl-H-G-x-C. 

NAME: Pancreatic ribonuclease family signature. 
CONSENSUS: C-K-x(2)-N-T-F. 

NAME: DNA/RNA non-specific endonucleases active site. 
CONSENSUS: D-R-G-H-[QIL]-x(3)-A. 

NAME: Thermonuclease family signanire I. 

CONSENSUS: D-G-D-T-lLIVM]-x-[UVMC]-x(9,10)-R-[LrVNQ-x(2)-[UVM]-D-x-P-E. 

NAME: Thermonuclease family signature 2. 

CONSENSUS: r>[KR]-Y-[GQ]-R-x-[LVHGA]-x-{IVMFYWJ. 

NAME: Beia-amylase active site 1. 
CONSENSUS: H-x-C-G-G-N-V-G-D. 

NAME: Beta -amylase active site 2. 

CONSENSUS: G-x-[SA]-G«E-[LIVM]-R-Y-P-S-Y. 

NAME: Glucoamylase active site region signature. 
CONSENSUS: ISTN]-[GP]-x(1.2V[DE]-x-W-E-E-x(2)-[GS]. 

NAME: Polygalacturonase active site. 

CONSENSUS: [GSDENKRH]-x(2)-[VMFCI-x(2)-[GS].H-G-[UIVMAG]-x(1.2)-[LIVM]-G-S. 
NAME: Clostridium cellulosome enzymes repeated domain signature. 

CONSENSUS: D-ILIVMFYl-[DNVl-x-lDNS]-x(2HLIVM].[DNl-(SAUd].x-D-x(3)-tUVMF].x- 
CONSENSUS: [RKS]-x-[LIVMF] . 

NAME: Chitinases family 18 active site. 

CONSENSUS: [LIVMFYl-[DNl-G-tUVMF3-fDN]-ILIVMF]-tDN]-x-E. 
NAME: Chitinases family 19 signature t. 

CONSENSUS: C-x(4,5>F-Y-(ST]-x(3>-[r^-[LIVMF]-x.A-x(3)-(YF]-x(2)-F-CGSA}. 
NAME: Chitinases family 19 signanire 2. 

CONSENSUS: [UVMMGSA)-F-x-[STAGl(2)-[UVMFY^^ 

NAME: AJpha-tactalbumin / lysozyme C signature. 
CONSENSUS: C-x(3K-x(2HLMF]-x(3)-[DENl-(LIl-x(5)-C. 

NAME: Alpha -galactos idase signature. 

CONSENSUS: G-{LIVMFY1-x(2)-[LIVMFn-*-a^ . 
NAME: Trehalase signanire 1 . 
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CONSENSUS: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y. 

NAME: Trehalase signature 2. 

CONSENSUS: Q-W-D-x-P-x-[GA]-W-[PA]-P. 

NAME: AJpha-L-fticosidase putative active site. 
CONSENSUS: P-x(2)-L-x(3)-K-W-E-x-C. 

NAME: Glycosyl hydrolases family I active site. 

CONSENSUS: [LlVMFSTC].[LIVFYS]-tLIV]-(LIVMSTl-E-N-G-[LIVMFAR]-[CSAGN). 
NAME: Glycosyl hydrolases family 1 N-terminal signature. 

CONSENSUS: F-x-LFYWMJ-[GSTAJ-x-[GSTA)-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-lGSTA]. 
NAME: Glycosyl hydrolases family 2 signature t. 

CONSENSUS: N-x-[LIVMFYWD]-R-[STACN](2)-H-Y.P-x(4).(LIVMFYW](2>.x(3)-[DN]-x(2)- 
CONSENSUS: G-[LIVMFYW](4). 

NAME: Glycosyl hydrolases family 2 acid/base catalyst. 

CONSENSUS: [DENQFl-[KRVW3-N-H-[APl-[SAC]-fUVMFl(3)-W.[GS].x(2.3)-N.E. 
NAME: Glycosyl hydrolases family 3 active she. 

CONSENSUS: (LIVMl(2)-[KR)-x-(EQKl-x(4)^.CLrVMFTJKUVT]-[LIVMFl-[ST]-D-x(2)- 
CONSENSUS: [SGADNIJ. 

NAME: Glycosyl hydrolases family 5 signature. 

CONSENSUS: [LIV]-lLIVMFYWGA3(2HDNEQG]-[LIVMGST]-x-N-E.[PV]-[RHDNSTLIVFY]. 

NAME: Glycosyl hydrolases family 6 signature 1. 

CONSENSUS: V-xOr-x(2VP-x-R-D-C-[GSAF]-x(2)-[GSA](2)-x-G. 

NAME: Glycosyl hydrolases family 6 signature 2. 

CONSENSUS: [LIVMYA]-[LIVA]-[LIVT1-[LIV3.E-P-D-[SAL]-[LI]-[PSAG]. 
NAME: Glycosyl hydrolases family 8 signature. 

CONSENSUS: A.{ST]-D-[AG]-D-x(2)-[IMl-A-x-[SA]-[lJVM]-tUVMG]-x-A-x(3)-[FW]. 

NAME: Glycosyl hydrolases family 9 active sites signature 1 . 

CONSENSUS: [STV]-x-[LIVMFY]-[STV)-x(2)-G-x-[NKR]-x(4HPLIVMl-H-x-R. 

NAME: Glycosyl hydrolases farmly 9 active sites signature 2. 
CONSENSUS: [FYW].x-D-x(4)-[FYW]-x(3)-E-x-lSTAl-x(3)-N-[STA]. 

NAME: Glycosyl hydrolases family 10 active site. 

CONSENSUS: lGTA]-x(2MLIVN]-x-[IVMFl-[ST]-E-(LlY]-[DN]-(LIVMF]. 

NAME: Glycosyl hydrolases family 11 active site signature 1. 
CONSENSUS: lPSA]-[LQl-x-E-Y-Y-[LIVM](2)-[DE]-x-[FYWHN] . 

NAME: Glycosyl hydrolases family 1 1 active site signature 2. 

CONSENSUS: [LP/MFl-x(2)-E-[AG]-[YWG].[QRFGS]-[SG]-[STANl-G.x-[SAFJ. 
NAME: Glycosyl hydrolases family 16 active sites. 

CONSENSUS: E-(UV]-D-[UV]-x(0,l)-E.x(2)-(GQ)-[KRNF]-x-[PSTA). 
NAME: Glycosyl hydrolases family 17 signature. 

CONSENSUS: [LIVMl-x-[UVMFYWA3(3HSTAG)-E-[STA]-G-W.P-[STN]-x-[SAGQJ. 

NAME: Glycosyl hydrolases family 25 active sites signature. 
CONSENSUS: r>[UVM]-x(3)-P<Q]-[PG]-x<9,10)^ 
CONSENSUS: Y-x-[DN]. 

NAME: Glycosyl hydrolases family 31 active site. 
CONSENSUS: IGF]-[UVMF]-W«x-D-M-[NSA)-E. 

NAME: Glycosyl hydrolases family 31 signature 2. 

CONSENSUS: G^AV]-I>.(UVMT]-C-G-[FYl-x(3)-[STl-x(3)-L-C-x-R-W-x(2HLV]-[GSl-fSA]. 
CONSENSUS: F-x-P-F-x-R-[DN]. 

NAME: Glycosyl hydrolases family 32 active site. 
CONSENSUS: H-x(2)-P-x(4HLlVM]-N-D-P-N-G. 

NAME: Glycosyl hydrolases family 35 putative active site. 
CONSENSUS: G<i-P-[LiVM](2)-x(2>^x-E«N-E-[FY). 
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NAME: Glycosyl hydrolases family 39 active site. 
CONSENSUS: W-x-F-E-x-W-N-E-P-(DN). 

NAME: Glycosyl hydrolases family 45 active site. 
CONSENSUS: [STA]-T-R-Y-{FYW]-D-x(5MCA]. 

NAME: Prokaryotic transglycosylases signature. 

CONSENSUS: [LIVM]-x(3)-E-S-x(3)-[AP]-x(3)-S-x(5>G.[LIVMl-[LIVMFYW].x-[LIVMr^- 
CONSENSUS: x(4)-[SAG]. 

NAME: Inosine-uridine preferring nucleoside hydrolase family signature. 
CONSENSUS: D-x-D-[PTHGA]-x-D-D-[TAV]-[Vr]-A. 

NAME: Alkylbase DNA glycosidases alkA family signature. 

CONSENSUS: G.I-G-x-W-(ST].[AV].x.[UVMFYl(2).x-[LIVM]-x(8)-[MF]-x(2).(ED].D. 
NAME: Formamidopyrimidine-DNA glycosylase signature. 

CONSENSUS: C-x(2 t 4)-C-x-[GTAQJ-x-[IV]-x(7)-R-lGSTANJ-lSTA]-x-[FYI]-C-x(2)-C-Q. 

NAME: Uracil-DNA glycosylase signature. 

CONSENSUS: [KRMUVHLIVC]-[LIVM]-x-G-[Qi]-D-P-Y. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 1. 

CONSENSUS: [CSJ-N-x-[FYL]-S-[ST]-[QAHDEN]-x-[AV](2)-A-A-[LrV].[SAV]. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 2. 
CONSENSUS: G-K-x(3MI^)-*-G-Y-G-x-V-G-{KR]-G-x-A. 

NAME: Cytosol aminopeptidase signature. 
CONSENSUS: N-T-D-A-E-G-R-L. 

NAME: Aminopeptidase P and proline dipeptidase signature. 

CONSENSUS: [HA)^GSYR]-[UVMT]-(SG].H-x-(LIV]-G-[LIVM]-x-[IV]-H-[DEl. 
NAME: Methionine aminopeptidase subfamily 1 signature. 

CONSENSUS: [MFY]-x-G-H-G-[LIVMQ-[GSHl-x(3)-H-x(4)-(LIVM]-x-[HN]-lYWVl. 
NAME: Methionine aminopeptidase subfamily 2 signature. 

CONSENSUS: fDAJ-[LIVMY3-x-K-lUVMJ-D-x-G-x-lHQl-[UVMl-[DNS]-G x(3)-[DN]. 
NAME: Renal dipeptidase active site. 

CONSENSUS: [UVM1-E-G-[GA)-a(2HLIVMF1-x(6)-L-x(3)-Y-x(2)-G-[LIVM1-R. 

NAME: Serine carboxypeptidases, serine active site. 
CONSENSUS: [UVM]-x-[GTA]-E-S-Y-[AG]-[GS]. 

NAME: Serine carboxypeptidases, histidine active site, 

CONSENSUS: [LIVF]-x(2)-[UVSTA]-x-[rVPST]-x-{GSDNQL]-[SAGV].[SG3-H-x-[IVAQ].p.x(3>- 
CONSENSUS: [PSA]. 

NAME: Zinc carboxypeptidases. zinc-binding region 1 signature. 
CONSENSUS: [PK]-x-[UVMFY]-x-[U\MFY]-x^ 
CONSENSUS: ILIVMFYTAJ. 

NAME: Zinc carboxypeptidases, zinc-binding region 2 signature. 
CONSENSUS: H-[STAG]-x(3)-fLIVME3-x(2)-[LIVMFYW]-P-[FYW]. 

NAME: Serine proteases, trypsin family, histidine active site. 
CONSENSUS: [LIVMHSTM-fSTAGJ-H-C. 

NAME: Serine proteases, trypsin family, serine active site. 

CONSENSUS: [DNSTAGC^[GSTAPIMVQH]-x(2)-G-{DEl-S•G•[GS]-tSAPHV].[LIVMFYWH]- 
CONSENSUS: [UVMFYSTANQH]. 

NAME: Serine proteases, subtilasc family, aspartic acid active site. 

CONSENSUS: [STAIVl-x-[Lr^Min-lUVM]-D-pSTA]-G-[LiVMFCl-x(2,3^[DNH]. 

NAME: Serine proteases, subtilasc family, histidine active site. 

CONSENSUS: H-G-[STM]-x-[VIC]-[STAGq-[GSl-x.fLIVMA]-[STAGCLV]-{SAGM]. 

NAME: Serine proteases, subtilasc family, serine active site. 
CONSENSUS: G-T-S-x-{SAl-x-P-x(2MSTAVq-[AG]. 
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NAME: Serine proteases, V8 family, histidine active site. 

CONSENSUS: [ST]-G-[LTv-MFfW](3)-[GN]-xa)-T-[LIVM]-x-T-x(2)-H. 

NAME: Serine proteases, V8 family, serine active site. 
CONSENSUS: T-x(2)-[GCJ-[NQ]-S-G-S-x-[LIVM].[FYl. 

NAME: Serine proteases, omptin family signature 1 . 
CONSENSUS: W-T-D-x-S-x-H-P-x-T. 

NAME: Serine proteases, omptin family signature 2. 

CONSENSUS: A^-Y-Q-E-{ST]-R-(nrW]-S4FYWl>rTN]-A-x-G-G-[STl-Y. 
NAME: Prolyl endopeptidase family serine active site. 

CONSENSUS: r> x (3).A-x(3)-[UVMI^.x(14)-G-x^-x-G.G-[LIVMFYW](2). 
NAME: Endopeptidase Clp serine active site. 

CONSENSUS: T-x(2)-[LIVMFl-G-x-A-[SACl-S-[MSA]-[PAG]-[STA]. 

NAME: Endopeptidase Clp histidine active site. 

CONSENSUS: R-x(3)-[EAPl-x(3MUVMFYT]-M-[LIVM]-H-Q-P. 

NAME: ATP-dependent serine proteases. Ion family, serine active site. 
CONSENSUS: D-G-tPD]-S-A-[GS]-[UVMCA}-[TA]-[LIVM]. 

NAME: Eukaryotic thiol (cysteine) proteases cysteine active site. 
CONSENSUS: Q-x(3)lGEl-x-C-lYW]-x(2)-[STAGCHSTAGCV]. 

NAME: Eukaryotic thiol (cysteine) proteases histidine active site. 

CONSENSUS: (LIVMGSTANl-x-H-[GSACE]-[UVM]-x-[LIVMAT](2)-G-x-[GSADNH]. 

NAME: Eukaryotic thiol (cysteine) proteases asparagine active site. 
CONSENSUS: [FYOT]-[WO-[LIVT}-v[KRQAG)-N^ 
CONSENSUS: [LIVMFYG]-x-[LIVMF] . 

NAME: Ubiquitin carboxy I -terminal hydrolase family 1 cysteine active-site. 
CONSENSUS: Q-x(3VN-[SA]-C-G-x(3)-[LIVM](2)-H-lSA]-[LIVMl-[SA]. 

NAME: Ubiquitin carboxyl- terminal hydrolases family 2 signature 1. 
CONSENSUS: GHLIVMFY]-x(l,3HAGC]-[NASM]-x-C-[FYW^ 
CONSENSUS: Q. 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 
CONSENSUS: Y.x-L-x-[SAGl-[LIVMFT|-xa)-H-x-G.x(4 t 5)^}-H-Y. 

NAME: Caspase family histidine active site. 

CONSENSUS: H-x(2,4)-[SCJ-x(4HLIVMF](2)-[ST|-H-G. 

NAME: Caspase family cysteine active site. 
CONSENSUS: K-P-K-[LIVMF](4)-Q-A-C-rRQG]-G. 

NAME: Eukaryotic and viral aspartyl proteases active site. 

CONSENSUS: rLrVTvJFGACJ-lUVMTADNl-lUVFSA].D-{ST]^-[STAVl-(STAPDENQJ-x.rLIVMF^ 
CONSENSUS: x-[UVMFGTA]. 

NAME: Neutral zinc metallopeptidases, zinc-binding region signature. 

CONSENSUS: [GSTALIVN]-x(2VH-E-[UVMFYW].{DEHRKP}-H-x-[LIVMFnVGSPQJ^ 

NAME: Matrixins cysteine switch. 

CONSENSUS: P-R-C-[GN]-x-P-rDR]-{LIVSAPKQJ . 

NAME: Insulinase family, zinc-binding region signature. 
CONSENSUS: G-x(8,9)-G-x-[STA]-H-[UVMFY]-[LIVM 
CONSENSUS: [GSTAN]-[GSTJ. 

// 

AC PS01016; 

DE Glycoprotease family signature. 

CONSENSUS: [KR]-(GSAT]-x(4>rFnVHLh(DQNGK]-x-P-x-rLIVMFY]-x(3)-H-x{2HAG]-H- 
CONSENSUS: [UVM]. 

NAME: Pro tea some A- type summits signature. 
CONSENSUS: [FY^x(4MST^^v^-x-[FYW^S-P-x^^ 
CONSENSUS: [SAG]. 
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NAME: Pro tea some B-type subunits signature. 

CONSENSUS' (LIVMA]-[GSA]-n-IVMF)-x-fFYLVGAC].x(2)-[GSACFY]-[LIVMSTAC](3)-IGAC]- 
CONSENSUS: (GSTACV]-[DESl-x(15)-[RK]-x(t2,13>.G-x(2)-[GSTA3«D. 

NAME: Signal peptidases 1 serine active site. 
CONSENSUS: {GS]-x-S-M-x-[PSMATMLFJ. 

NAME: Signal peptidases I lysine active site. 

CONSENSUS: K-R-{HVMSTAl(2>-G-x.[PG]-G-PE)-x-[LIVMl-x.(LIVMFYl. 
NAME: Signal peptidases I signature 3. 

CONSENSUS: [LIVMFYWJ(2>-x(2)-G-r>[NH]-x(3)-[SND]-x(2V[SG]. 
NAME: Signal peptidases II signature. 

CONSENSUS: [GAFJ-IGA]-[GAS]-[LIVM]-[GAS]-N-[LVMFG]-[LIVMFY1-D.R-[LIMFA]. 
NAME: Peptidase family U32 signature. 

CONSENSUS: E-x-F-x(2)-G-{SA]-[LIVM]-C-x<4).G-x-C-x.[LIVM3-S. 
NAME: Amidases signature. 

CONSENSUS: G-[GA]-S-S-[GS]-G-x-[GSA]-[GSAVYl-x.[LIVMMGSA]-x(6)-IGSA]-x-[GA]-x-D- 
CONSENSUS: x-[GA]-x-S-[LIVM]-R-x-P-[GSAC]. 

NAME: Asparaginase / glutaminase active site signature I. 
CONSENSUS: [LIVM]-x(2)-T-G-G-T-[IVHAGS]. 

NAME: Asparaginase / glutaminase active site signature 2. 
CONSENSUS: G-x-|LIVM|-x(2)-H-G-T-D-T-[LIVM]. 

NAME: Urease nickel ligands signature. 

CONSENSUS: T-[AYHGA]-[GAT]-[lJVM]-D-x-H-[LiVM]-H-x(3)-P. 
NAME: Urease active site. 

CONSENSUS: [UVM](2MCn>H-[HN]-L-x^ 

NAME: ArgE / dapE / ACYl / CPG2 / yscS family signature 1 . 
CONSENSUS: [UV]-[GALMY]-[LlVMF]-x-tGSA]-H-x-D.|TVl.[STAV]. 

NAME: ArgE / dapE / ACYI / CPG2 / yscS family signature 2. 
CONSENSUS: [GSTAfl-[SAN<a-r>-x-K-[GSACN]-x(2MLW 
CONSENSUS: x-[UVMF]-[LIVMSTAG}-[LIVMFA]-x(2)-[DNG]-E-E-x-lGSTNl. 

NAME: Dihydroorotase signature 1. 

CONSENSUS: D-[LIVMFYWSAP]-H-tLIVAJ-H-lLIVF]-[RN]-x.[PGNl. 

NAME: Dihydroorotase signature 2. 
CONSENSUS: [GA]-[ST]-D-x-A«P-H-x(4)-K. 

NAME: Beta-lactamase class-A active site. 

CONSENSUS: tFY]-x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLMl.x(2)-[LC]. 

NAME: Beta-lactamase class-C active site. 
CONSENSUS: F-E«[LIVM]-G-S-[LIVMGMSA]-K. 

NAME: Beta-lactamase class-D active site. 

CONSENSUS: (PA]-x-S-{ST]-F-K.[i.IV].[PAL]-x.(STA]-[LI]. 

NAME: Beta-lac tamases class B signature 1. 

CONSENSUS: [LI]-x.[STN]-[HN]-x-H-[GSTA]-D.x(2)-G-[GP]-x(7,8)-[GS]. 

NAME: Beta- lactamases class B signature 2. 

CONSENSUS: P-x(3>-[LIVM](2)-x-G-x-C-[LIVMF](2)-K. 

NAME: Arginase ramily signature 1 . 

CONSENSUS: (LIVMF)-G^-x-H-x-(UVMT)-[STAV]-x.(PAG]-x(3)-[GSTA). 

NAME: Arginase family signature 2. 

CONSENSUS: (UVM](2)-x-[UVMFY]-I>-[AS]-H-x-D. 

NAME: Arginase ramily signature 3. 
CONSENSUS: [ST]-[LIVMr^-D-{U\™^ 

NAME: Adenosine and AMP deaminase signature. 
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CONSENSUS: [SAHUVM]-[NGSHSTA]-D-D-P. 

NAME: Cytidine and deoxycytidylate deaminases zinc-binding region signature. 

CONSENSUS: [CH]-[AGVl-E-x(2>[lJVMFGAT].[LIVM].x(17,33>P-C-x(2.8)-C-x(3)-[UVM]. 

NAME: GTP cyclohydrolase I signature I. 

CONSENSUS: [EN]-[LIVMl(2)-x(2)-[KRQN]-[DN].[UVMl.x(3).[ST]-x-C-E-H-H. 

NAME: GTP cyclohydrolase I signarure 2. 

CONSENSUS: [SA]-x-(RKl.x-Q-{LIVM]-Q-E.[RN]-[LI].rrSN]. 

NAME: Nitrilases / cyanide hydratase signature 1 . 

CONSENSUS: G-x(2)-[LIVMFY](2)-x-[IF]-x-E-x(2)-[LIVM}-x.G-Y-P. 

NAME: Nitrilases / cyanide hydratase active site signature. 

CONSENSUS: G.[GAQ3-x(2)-C-tWA]-E.[NH]-x(2)-(PSTl-ILIVMFYS]-x-[KR]. 

NAME: Inorganic pyrophosphatase signature. 

CONSENSUS: D-[SGDN]-D-IPE]-[LIVMF]-D-[LIVMGAC]. 

NAME: Acylphosphatase signature 1. 
CONSENSUS: [UV]-x-G-x-V-Q-G-V-x-[FM>R. 

NAME: Acy Iphosphatase signature 2 . 

CONSENSUS: G-[FYW]-[AVq-[KRQAM]-N-x(3)«G-x-V-x(5)-G. 

NAME: ATP synthase alpha and beta subunits signature. 
CONSENSUS: P-{SAP]-[LIVMDNH]-x(3)-S-x-S. 

NAME: ATP synthase gamma subunit signature. 
CONSENSUS: [I\H-T-x-E-x(2)-[DE|-x(3)-G-A-x-[SAKR]. 

NAME: ATP synthase delta (OSCP) subunit signature. 
CONSENSUS: [UVM]-x-[LIVMFm-x<3MLIVNrn^^ 
CONSENSUS: x-[LIVM]-[KRHENQ]-x-[GSEN]. 

NAME: ATP synthase a subunit signature. 

CONSENSUS: [STAGN]-x-[STAG]-[LIVMF]-R-L-x-[SAGV]-N-[LIVMT]. 
NAME: ATP synthase c subunit signature. 

CONSENSUS: [GSTA]-R-[NQ]-P-x(10)-[LIVMFYWl(2).x(3)-[LIVMFYW]-x-[DE]. 

NAME: El -E2 ATPases phosphorylation site. 
CONSENSUS: D-K-T-G-T-[LI]-[TI]. 

NAME: Sodium and potassium ATPases beta subunits signature 1. 

CONSENSUS: [FYW]-x(2)-tFYW]-x-(FYWl-[DNl-x(6)-[UVMl-G-R-T-x(3)-W. 

NAME: Sodium and potassium ATPases beta subunits signature 2. 
CONSENSUS: fRKl-x(2K-[^QWI]-x(5>-L-x<2)-C-[SA]-G. 

NAME: GDA1/CD39 family of nucleoside phosphatases signature. 

CONSENSUS: [UVM]-x^}-x(2>-E-G-x-[FY].x-[FVVl-lUVA|-(TAG]-x-N-[HY]. 

NAME: Iodothyronine deiodinases active site. 
CONSENSUS: R-P-L-V-x-N-F-G-S-[CA]-T-C-P-x-F. 

NAME: Cutinase. serine active site. 

CONSENSUS: P-x[STA]-x-(LIVl-[rvTJ.x-[GS]-G.Y.S-[QL]-G . 

NAME: Cutinase, aspartate and hisudine active sites. 

CONSENSUS: C-x(3)-D-x-[iV]-C-x-G-[GST]-x(2)-[UVM]-x(2,3)-H. 

NAME: DDC / GAD / HDC / TyrDC pyridoxal -phosphate attachment site. 
CONSENSUS: S-0JVMFYW]-x(5)-K-[UVMF™G^^ 
CONSENSUS: x(2MRK]. 

NAME: Om/Lys/Arg decarboxylases family 1 pyridoxal-P attachment site. 
CONSENSUS: (STAV].x-S.x-H.K-x(2HGSTAN](2>x.[STA]-Q-[STAl(2). 

NAME: Orn/DAP/Arg decarboxylases family 2 pyridoxal-P attachment site. 
CONSENSUS: (FYMPA)-x-K-{SACV]-{NHCU^-xtf^ 
CONSENSUS: [GTE]. 
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NAME: Orn/DAP/Arg decarboxylases family 2 signature 2. 

CONSENSUS: [GS]-x(2,6)-{LrVMSCP]-x(2)-[LIVMn-[DNS]-[LIVMCA]-G-G-G-[LIVMFY]- 
CONSENSUS: [GSTPCEQJ. 

NAME: Oroudine 5 '-phosphate decarboxylase active site. 

CONSENSUS: [LIVMFTAl-[LIVMF)-x.D-x-K-x(2)-D-MGPJ-x-T-lLIVMTA]. 

NAME: Phosphoenolpyruvate carboxylase active site 1. 
CONSENSUS: [VT]-x-T-A-H-P-T-[EQ]-x(2)-R-[KRH]. 

NAME: Phosphoenolpynivate carboxylase active site 2. 
CONSENSUS: [IV]-M-[LlVM]-G-Y-S-D-S-x-K-D-lSTAG]-G. 

NAME: Phosphoenolpynivate carboxykinase (GTP) signature. 
CONSENSUS: F-P-S-A-C-G-K-T-N. 

NAME: Phosphoenolpyruvate carboxykinase (ATP) signature. 
CONSENSUS: L-l-G-D-D-E-H-x-W-x-[DE]-x-G-[IV]-x-N. 

NAME: Uroporphyrinogen decarboxylase signature 1 . 
CONSENSUS: P-x-W-x-M-R-Q-A-G-R. 

NAME: Uroporphyrinogen decarboxylase signature 2. 

CONSENSUS: G-F-[STAGCV]4STAGC]-x-P-[FYW]-T-[LV]-x(2)-Y-x(2)-[AE]-[GK]. 

NAME: Indole-3-glycerol phosphate synthase signature. 
CONSENSUS: [UVMFYHLF/MC]-x-E-[UVM^ 

NAME: Ribulose bisphosphale carboxylase large chain active site. 
CONSENSUS: G-x-[DN}-F-x-K-x-D-E. 

NAME: Fructose-bisphosphate aldolase class-I active site. 
CONSENSUS: [LIVM]-x-[LIVMFYW]-E-G.x-[LS]-L-K-P-[SN]. 

NAME: Fructose-bisphosphate aldolase dass-II signature 1. 

CONSENSUS: [FYVM]-x(1.3)-[UVMH]-tAPNl-[L!VM]-x(l,2)-rLIVM]-H-x-D-H-[GACH]. 

NAME: Fructose-bisphosphate aldolase class-II signature 2. 
CONSENSUS: rLIVM]-E-x-E-[LIVM]-G-x(2MGMMGSTA]-x-E. 

NAME: Malate synthase signature. 

CONSENSUS: [KRHDENQ]-H-x(2)-G-L-N.x-G-x-W-D-Y-[LIVM]-F. 

NAME: Hydroxymethylglutaryl-coenzyme A lyase active site. 
CONSENSUS: S-V-A-G-L-G-G-C-P-Y. 

NAME: Hydroxymethylglutaryl-coenzyme A synthase active site. 
CONSENSUS: N-x-[DN]-[IV)-E-G-[IVJ-D.x(2)-N-A-C-[FY]-x-G. 

NAME: Citrate synthase signature. 

CONSENSUS: G-[FYA]-[GA]-H-x-[IV].x(l,2)-[RKT]-x(2)-D-[PS]-R. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 1. 
CONSENSUS: L-R-[DE)-G-x-Q-x(10)-K. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 2 . 
CONSENSUS: [UVMFW]-x(2)-H.x-H-(DNl-D-x-G-x-[GAS]-x.[GASLIl. 

NAME: KDPG and KHG aldolases active site. 
CONSENSUS: G-[UVM]-x(3)-E-[LIV]-T-[LF}-R. 

NAME: KDPG and KHG aldolases Schiff-base forming residue. 
CONSENSUS: G-x(3)-{LIVMF]-K-[LF]-F-P.tSA]-x(3)-G. 

NAME: Isocitrate lyase signature. 
CONSENSUS: K-[KR]-C-G-H-[LMQ]. 

NAME: Beta-eliminating lyases pyridoxal-phosphate attachment site. 
CONSENSUS: Y-x-D-x(3)-M.S-CGA]-K-K-D.x-[LrVM](2>-x-tLIVMl-G-G. 

NAME: DNA photo lyases class 1 signature 1. 

CONSENSUS: T-G-x-P-(UVM](2>D-A-x-M-[RA]-x-fLIVM] . 

NAME: DNA photo lyases class 1 signature 2. 
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CONSENSUS: [D^-R-x-R-fLIVM](2)-x-[STAl(2)-F-[LrVMFAJ-x-K-x-L-x(2,3)-W-IICRQ]. 

NAME: DNA photolyases class 2 signature 1. 
CONSENSUS: F-x-E-E-x-[LIVM](2)-R-R-E-L-x(2)-N-F. 

NAME: DNA photolyases class 2 signature 2, 

CONSENSUS: G-x-H-D-x(2)-W-x-E-R-x-lLIVM]-F-G-K-[LIVM]-R-(FY]-M-N. 
NAME: Eukaryotic-type carbonic anhydrases signature. 

CONSENSUS: S-E-H-x-[UVMl-x(4)-IFYH]-x(2>E-[LIVMJ-H-[LIVMFA](2). 

NAME: Prokaryouc-type carbonic anhydrases signature 1. 
CONSENSUS: C-[SA]-D-S-R-lLIVM]-x-[AP], 

NAME: Prokaryotic-type carbonic anhydrases signature 2. 

CONSENSUS: [EQ]-Y-A-[LIVM]-x(2)-[LIVM]-x(4)-[LIVMF](3)-x-G-H-x(2)-C-G. 

NAME: Fumarate lyases signature. 
CONSENSUS: G-S-x(2)-M-x(2)-K-x-N. 

NAME: Aconitase family signature 1. 

CONSENSUS: [LIVMl-x(2)-[GSACIVM]-x-[LIVl-[GTIVl-[STP]-C.x(0 t l)-T-N-[GSTANIl-x(4)- 
CONSENSUS: [LFVMA]. 

NAME: Aconitase family signature 2. 

CONSENSUS: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV].[GA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 1 . 
CONSENSUS: C-D-K-x<2)-P-[GA]-x(3MGA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 2. 
CONSENSUS: [SA]-L-[UVM]-T-D-lGAJ-R-tLIVMF]-S-[GA].[GAVMST]. 

NAME: Dehydroquinase class I active site. 

CONSENSUS: I>(LIVMl-lDE]-[LIV^-x(!8 ( 20)-(l^M](2).x.{SC]-[NHYl-H.[DN]. 
NAME: Dehydroquinase class II signature. 

CONSENSUS: [LIVM)-tNQ]-G-P.N-tLV].x(2).L-G-x.R.[QED]-P-x(2)-tFY]-G. 
NAME: Enolase signature. 

CONSENSUS: [LIVJ(3)-K-x-N-Q-I^3-[ST]-[LIV].[ST).[DE]-[STA]. 

NAME: Serine/threonine dehydratases pyridoxal -phosphate attachment site. 

CONSENSUS: (DESH]-x(4,5)-[STVG].x-[AS]-[FYI].K.[DLIFSA]-CRVMF]-[GA]-[LIVMGA). 

NAME: Enoyl-CoA hydratase/isoraerase signature. 

CONSENSUS: [LIVMl.[STA}-x-[UVM]-[DENQRHSTA]-G-x(3)-[AG](3)-x(4)-[LIVMST)-x-[CSTA]- 
CONSENSUS: [DQHP]-[LIVMFY1. 

NAME: Imidazoleglycerol-phosphate dehydratase signature t. 

CONSENSUS: [UVMY]-[DE]-x-H-H-x(2)-E-x(2)-[GCA]-[LIVM]-[STAC]-[LIVMl. 

NAME: Imidazoleglycerol-phosphate dehydratase signature 2. 
CONSENSUS: G-x-tDN]-x-H-H-x(2)-E-[STAGC]-x-[FY]-K. 

NAME: Tryptophan synthase alpha chain signature. 
CONSENSUS: [LIY7^-E-[UVM]-G-x(2MF^^ 

NAME: Tryptophan synthase beta chain pyridoxal-phosphate attachment site. 
CONSENSUS: [UVM]-x-H-x-G-[STA}-H-K-x-N. 

NAME: Delta-aminolevulinic acid dehydratase active she. 
CONSENSUS: G-x-D-x-[UVM](2)-tIVl-K-P-[GSA}-x(2)-y. 

NAME: Urocanase active site. 
CONSENSUS: F-Q-G-L-P-x-R-I-C-W. 

NAME: Prcphenate dehydratase signature 1 . 
CONSENSUS: [F^x-[UVM]-x(2)-[UVM]-x(5)-^^ 

NAME: Prephenate dehydratase signature 2. 
CONSENSUS: (1JVM]-{ST]-{KR)-[UVM]-E-{ST]-R-P. 

NAME: Dihydrodipicolinate synthetase signature 1. 
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CONSENSUS: [GSA]-[LIVMJ-[LIVMFy]-x(2)-G-[ST].[TG]-G-E-[GASNF3-x(6)-[EQJ. 
NAME: Dihydrodipicolinate synthetase signature 2. 

CONSENSUS: Y-[DNS]-[UVMF]-P-x(2)-[STJ-x(3)-[LrVM]-x(l3,14)-ELIVM]-x-[SGA]-[LIVMF]- 
CONSENSUS: K-[DEQAFJ-[STAC]. 

NAME: RsuA family of pseudouridine synthase signature. 
CONSENSUS: G-R-L-D-x(2HSTl-x-G-[LIVMFl(4)-[STJ-[DNTl. 

NAME: Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site. 
CONSENSUS: K-x-E-x(3)-[PAMSTAGC>x-S-fIVAP]-K-x-R-x-lSTAG]-x(2).lLIVM]. 

NAME: Phenylalanine and histidine ammonia-lyases signature. 

CONSENSUS: G-lSTG]-[UVM]-[STGl-[AC]-S-G-[DHl-L-x-P-L-[SA).x(2)-[SA}. 

NAME: Porphobilinogen deaminase cofactor-binding site. 

CONSENSUS: E-R-x-[LIVMFA]-x(3HLIVMFJ-x-G-[GSA]-C.x-[IVT]-P-[LIVMFl-lGSA]. 

NAME: Cys/Met metabolism enzymes pyridoxal-phosphate attachment site. 
CONSENSUS: [rXH-[UVMr^x(3MSTACK:HSTArc^ 

NAME: Glyoxalase I signature 1. 

CONSENSUS: lH(y-[IVT]-x-[UVr^-x-[IVM 

NAME: Glyoxalase I signature 2. 

CONSENSUS: G-[NTKQ)-x(0.5)-CGA]-[LVFY}-tGHJ-H-tIVF]-[CGA]-x-[STAGL]-x(2)-tI>NC]. 

NAME: Cytochrome c and'cl heme lyases signature 1. 
CONSENSUS: H-N-x(2)-N-E-x(2)-W-[NQKR]-x<4>-W-E. 

NAME: Cytochrome c and cl heme lyases signature 2. 
CONSENSUS: P-F-D-R-H-D-W. 

NAME: Adenylate cyclases class-I signature 1. 
CONSENSUS: E-Y-F-G-[SA](2)-L-W-x-L-Y-K. 

NAME: Adenylate cyclases class-I signature 2. 

CONSENSUS: Y-R-N-x-W-{NS]-E-[LIVM]-R-T-L-H-F-x-G. 

NAME: Guanylate cyclases signature. 

CONSENSUS: f>V-(LWM]-x(0J)-G-x(5)-[r^-x-[LIVMl-[FYW]4GS)4DOTHKW].(Drni-[IV]. 
CONSENSUS: [DNTA]-x(5)-[DE]. 

NAME: Chorismate synthase signature 1. 

CONSENSUS: G-E-S-H-[CK3-x(2MLIVMMGTV]-x-[^ 

NAME: Chorismate synthase signature 2. 

CONSENSUS: [GE]-R-[SA](2)-[SAG3-R-[EVl-[STl-x(2)-[RH]-V-x(2)^. 
NAME: Chorismate synthase signature 3. 

CONSENSUS: R-[SH]-D-[PSV]-(CSAVl-x(4)4GAn-x-[IVGSP)-CLIVM]-x.E-[STAH]-[LIVM]. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature I. 
CONSENSUS: C-N-N-x(2)-G-H-G-H-N-Y. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 2. 
CONSENSUS: D-H-K-N-L-D-x-D. 

NAME: Ferrochelatase signature, 

CONSENSUS: [UVMFl(2)-x-S-x-H-[GS]-[UVM]-P-x(4.5)-[DENQKRJ-x-G-D-x-Y. 

NAME: Alanine racemase pyridoxal-phosphate attachment site. 
CONSENSUS: V-x-K-A-[DN]-[GA)-Y-G-H-G. 

NAME* Aspartate and ghitamate racemases signature 1. 

CONSENSUS: [IVAl-[LIVMl-x-C-x(0. l)-N-{STl.[MSA]-[STH3-[LIVFYSTANK]. 

NAME: Aspartate and ghitamate racemases signature 2. 
CONSENSUS: [UVMl(2)-x-[AGK-T-(DEHh 

NAME- Mandelate racemase / muconate lactonizing enzyme family signature 1. 
CONSENSUS: A-x-[SAG]{2HUVMHDEl-x.A-x(2)-D-x(2>-(GAJ-[KRl. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 2. 
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CONSENSUS: G-x(7)-D-x(9)-A-x(14HLrVMl-E-[DENQJ-P-x(4MDENQ]. 

NAME: Ribulose-phosphate 3-epimerase family signature 1 . 
CONSENSUS: [UVMF]-H-[UVMFY]-D-[IJVM]-x^ 

NAME: Ribulose-phosphate 3-epimerase family signature 2. 

CONSENSUS: |UVMA]-x-[LIVMl.M-[STl-{VS].x-P-x(3)-G-Q-x.F-x(6)-(NK]-[LIVMC]. 

NAME: Aldose 1-epimerase putative active site. 
CONSENSUS: INS]-x-T-N-H-x-Y-(FW]-N-[LI] . 

NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: [FYl-x(2HSTCNLV}-x-F-H.[RH}-[LIVMN]-tUVM]-x(2)-F-[LIVM]-x-Q-[AG).G. 
NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase profile. 
NAME: FKBP-type peptidyl-prolyl cis-trans isomerase signature I. 

CONSENSUS: [LIVMC]-x-[YF]-x-[GVL]-x(l t 2)-(LFT]-x(2)-G-x(3HDE]-[STAEQK]-[STANl. 
NAME: FKBP-type peptidyl-prolyl cis-trans isomerase signature 2. 

CONSENSUS: [LIVMFY]-x(2)-[GA]-x(3,4)-[LIVMF]-x(2)-[LIVMFHKl-x(2)-G-x(4V[LIVMF]- 
CONSENSUS: x(3MPSGAQ]-x(2)-[AGJ-[FY]-G. 

NAME: FKBP-type peptidyl-prolyl cis-trans isomerase domain profile. 

NAME: PpiC-type peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: F-[GSADEI]-x-[LVAQ>A.x(3)-[STl.x(3,4>-[STQJ-x(3,5)-[GER]-G-x-(LIVMl. 
CONSENSUS: [GSJ. 

NAME: Triosephosphate isomerase active site. 
CONSENSUS: IAV]-Y-E-P-[LIVM1-W-[SAH-G-T-[GK}. 

NAME: Xylose isomerase signature 1 . 
CONSENSUS: [LI]-E-P-K-P-x(2)-P. 

NAME: Xylose isomerase signature 2. 

CONSENSUS: [FL]-H-D-x-D*[LIV]-x-[PD]-x-[GDEl. 

NAME: Phosphomannose isomerase type I signature 1. 
CONSENSUS: Y-x-D-x-N-H-K-P-E. 

NAME: Phosphomannose isomerase type I signature 2. 

CONSENSUS: H-A-Y-[UVMJ-x-G-x(2)-[LIVM]-E-x-M-A-x-S-D-N-x-[LIVM]-R-A-G-x-T-P-K. 
NAME: Phosphoglucose isomerase signature I. 

CONSENSUS: [DENS]-x-[LIVMl^-G-R-[FY]-S-[UVMT]-x-tSTA3-(PSACl-[LIVMA]-G. 
NAME: Phosphoglucose isomerase signature 2. 

CONSENSUS: [GS]-x-[LIVM]-[UVMFYW]-x(4)-[FYl-PNl-Q-x-G-V-E-x(2)-K. 

NAME: Glucosamine/galactosamine-6-phosphate i some rases signature. 

CONSENSUS: rLIVMl-x(3)-G-x-[Lrri-x-[UVl-x-[LIVM]-x-G-[LIVMl-G-x-lDEN]-G-H. 

NAME: Phosphoglycerate mutase family phosphohistidine signature. 
CONSENSUS: [LIVM]-x-R-H-G-[EQ]-x(3)-N. 

NAME: Phosphoglucomutase and phosphomannomutase phosphoserine signature. 
CONSENSUS: [GSA]-[LIVM]-x-[LIVM]-[ST]-tPGA]-S-H-x-P-x(4)-|GNHE]. 

NAME: Methylmalonyl-CoA mutase signature. 

CONSENSUS: R-I-A-R-N-[TQ]-x(2)-[LIVMFY](2>-x-[EQ]-E-x(4)-[KRNl-x(2)-D-P-x-{GSA]- 
CONSENSUS: G-S. 

NAME: Terpene synthases signature. 

CONSENSUS: [DE]-G-S-W-x-G-x-W-[GA]-[UVM)-x-{FYl-x-Y-[GA]. 

NAME: Eukaryotic DNA topoisomerase I active site. 

CONSENSUS: [DEN]-x(6)-[GS)-[m-S-K-x(2>Y-[LIVMl-x(3)-[LIVM]. 

NAME: Prokaryotic DNA topoisomerase I active site. 

CONSENSUS: [EQ]-x-L-Y-[DEQT]-x(3, 12)-[U3-[ST)-Y-x-R-[STl-{DEQS]. 

NAME: DNA topoisomerase II signature. 
CONSENSUS: [UVMAJ-x«E-G-[DN]-S-A-x-[STAG]. 
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NAME: Aminoacyl-transfer RNA synthetases class-I signature. 

CONSENSUS: P-x(0,2)-(GSTAN1-[DENQGAPKl-x.[LIVMFP]-[HTl-[LIVMYAC]-G-[HNTG3. 
CONSENSUS: [LIVMFYSTAGPC]. 

NAME: Aminoacyl-transfer RNA synthetases class- II signature I. 
CONSENSUS: [FYH]-R-x-[DE]-x(4 t 12V[RHl-x(3)-F-x(3>[DEl. 

NAME: Aminoacyl-transfer RNA synthetases class-Il signature 2. 

CONSENSUS: [GSTALVn-{DENQHRKP}-[GSTA]-[UVMF]-[DE]-R-[LIVMn-x-(LIVMSTAG]«[LIVMFY3. 
NAME: WHEP-TRS domain signature. 

CONSENSUS: [QY].G-[DNEA>x-{LIV]-[iai]-x(2)-K-x(2).[!aiNG]-[ASl-x(4HLIV].[DENK]- 
CONSENSUS: x(2)-[IV]-x(2)-L-x(3)-K. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 1 . 

CONSENSUS: S-[KR]-S-G-[GTJ-[LIVM3-[GST]-x-[EQ3-x(8 t 10>-G-x(4HLIVM]-[GA]-[LIVM]-G- 
CONSENSUS: G-D. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family active site. 
CONSENSUS: G-x(2>-A-x(4 t 7)-[RQTl-[LIVMFl<J-H-[AS3-[GH]. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 3. 

CONSENSUS: G-x-[IV}-x(2)-[UVMF]-x-[NA]-G-[GA]-G-[LA3-[STAV]-x(4)-D-x-tLIVM]-x(3)- 
CONSENSUS: G-(GRE]. 

NAME: Glutamine synthetase signature 1. 

CONSENSUS: lFYWL]^D-G-S-S-x(6,8>-{DENQSTAK]-[SA]-[DE]-x(2)-tUVMFY]. 

NAME: Glutamine synthetase putative ATP-binding region signature. 
CONSENSUS: ' K-P-[UVMFV r A]-x(3,5HNPAT]-G-[GSTAN)-G-x-H-x(3)-S. 

NAME: Glutamine synthetase class-I adenylation site. 
CONSENSUS: K-[LIVM]-x(5)-{LIVMA]-D-[RK]-tDN]-[L]]-Y. 

NAME: D-alanine-D-alanine ligase signature 1. 

CONSENSUS: H-G-x(2)-G-E-D-G-x-[LIVMAHQSAHGSA]. 

NAME: D-alanine-D-alanine ligase signature 2. 

CONSENSUS: (UV]-x(3)-[GA]-x-[GSAIV]-R-[LIVCA]-D.[LIVMF](2)-x(7.9)-[LI]-x-E- 
CONSENSUS: [LIV A]-N-[STP]-x-P-[GA] . 

NAME: SAICAR synthetase signature 1. 

CONSENSUS: (LIVMF](2)-P-{LIVM]-E-x-[LIVM]-[LIVMCA]-R-x(3)-lTA]-G-S. 

NAME: SAICAR synthetase signature 2. 

CONSENSUS: [LrV>l]-[UVMA]-D-x-K-(LIVMFY]-E.F-G. 

NAME: Folylpolyglutamate synthase signature 1. 

CONSENSUS: [LrVMFil-x-(LIVM]-[STAG].G-T-(NK]-G-K-x-tST]-x(7)-[LIVM](2)-x(3)-[GSK]. 
NAME: Folylpolyglutamate synthase signature 2. 

CONSENSUS: (LIVMFY]{2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-(GSTl.x-[LIVM](2). 

NAME: Ubiquitin-activating enzyme signature 1. 
CONSENSUS: K-A-C-S-G-K-F-x-P. 

NAME: Ubiqu inn-activating enzyme active site. 
CONSENSUS: P-[LIVM]-C-T-[LrVMl-lKRH]-x-[FT3-P. 

NAME: Ubiquitin-conjugating enzymes active site. 
CONSENSUS: [FrWLSP>H-[rci-[NH]-[UV]^ 

NAME: Formate-tetrahydrofolate ligase signature 1. 
CONSENSUS: G-[UVM]-K-G-G-A-A-G-G-G-Y. 

NAME: Formate-tetrahydrofolate ligase signature 2. 
CONSENSUS: V-A-T^IV]-R-A-L-K-x-[HN]-G-G. 

NAME: Adenylosuccinate synthetase GTP-binding site. 
CONSENSUS: Q-W-G-D-E-G-K-G. 

NAME: Adenylosuccinate synthetase active site. 
CONSENSUS: G-l-[GR]-P-x-Y-x(2>-K-x(2)-R. 
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NAME: Argininosuccinate synthase signature 1. 
CONSENSUS: A-[FY]-S-G-G-L-D-T-S. 

NAME: Argininosuccinate synthase signature 2. 
CONSENSUS: G-x-T-x-K-G-N-D-x(2)-R-F. 

NAME: Phosphoribosylglycinamide synthetase signature. 
CONSENSUS: R-F-G-D-P-E-x-[QM). 

NAME: Carbamoyl-phosphatc synthase subdomain signature 1 . 
CONSENSUS: [FYV)-[PS]-[UVmMl^MAJ-rLIV^ 

NAME: Carbamoyl-phosphate synthase subdomain signature 2. 

CONSENSUS: [LIVMF1-[LIMN]-E-[LIVMCA)-N-[PATL1VM]-[KR]-[LIVMSTAC]. 

NAME: ATP-dependent DNA ligase AMP-binding site. 
CONSENSUS: [EDQH]-x-K-x-[DN]-G-x-R.[GACIVM]. 

NAME: ATP-dependent DNA ligase signature 2. 

CONSENSUS: E-G-[LIVMA]-[LIVM](2)-[KR]-x(5,8)-[YW]-[QNEK]-x(2,6)-[KRH].x(3,5>-K- 
CONSENSUS: [LIVMFYJK. 

NAME: NAD-dependent DNA ligase signature 1. 

CONSENSUS: K-[Lr™]-D-G-[LIVM]-[SA]-x(4>Y-x(2)<5-x-L.x(4)-[ST]-R-G-[DN]-G-x(2)-G- 
CONSENSUS: [DE]-[DENL]. 

NAME: NAD-dcpendent DNA ligase signature 2. 

CONSENSUS: [rsn-G-[KR]-(ST)-G-x-[UVM]-(STNK]-x.(VTl-x(2)-L.x-[PS]-V. 

NAME: RNA 3* -terminal phosphate cyclase signature. 
CONSENSUS: fRHl-G-x(2)-P-x-G(3)-x-[LIV]. 

NAME: Lipoate -protein ligase B signature. 

CONSENSUS: R-G-G-x(2)-T-tFYW]-H-x(2)-[GH]-Q-x-[LIVJ-x-Y. 

NAME: Isopenicillin N synthetase signature 1. 
CONSENSUS: [RK]-x-[STA]«x(2)-S-x-C-Y-(SL]. 

NAME: Isopenicillin N synthetase signature 2. 

CONSENSUS: [LIVM](2)-x-C-G-[STA3-x(2)-[STAG]-x(2)-T-x-[DNG]. 

NAME: Site-specific recombinases active site. 
CONSENSUS: Y-[UVAq-R-[VA]-S-[ST)-x(2)-Q. 

NAME: Site-specific recombinases signature 2. 

CONSENSUS: G-[DE]-x(2)-[LIVM]-x(3)-[LIVM)-[DT)-R-[LIVMl-[GSA]. 
NAME: Transposases, Mutator family, signature. 

CONSENSUS: D-x(3)-G-[LIVMFl-x(6)-[STAVn-{LIVMFnVl-tPT]-x-[STAV]-x(2)-[QR]-x-C-x(2)- 
CONSENSUS: H. 

NAME: Transposases, IS30 family, signature. 

CONSENSUS: R-G-x(2)-E-N-x-N-G-[LIVMl(2)-R-[QE]-[UVMFY](2)-P-K. 
NAME: Autoinducers synthetases family signature. 

CONSENSUS: [LMFY]-R-x(3)-F-x(2)-[KRl-x(2)-W-x-fLIVM]-x(6,9)-E-x-D-x-(FY]-D. 
NAME: Thiamine pyrophosphate enzymes signature. 

CONSENSUS: [LIVMF)-[GSA^x(5VP-x(4)-[LIVMFYVV]-x-[UVMF]-x-G-D-{GSA]-[GSAq. 

NAME: B Lot in- requiring enzymes attachment site. 
CONSENSUS: [GN]-rDEQTR]-x-[LlVMFn-x^ 
CONSENSUS: [SAV]. 

NAME: 2-oxo acid dehydrogenases acyltransferase component lipoyl binding site. 
CONSENSUS: [Ghn-x(2V[UVIH.x(5)<UVFq-x(2)-[LIVFAJ-x(3)-K-[STAIV]-[STAVQDNl- 
CONSENSUS: x(2)-{LIVMFSl-x<5)-[GCN]-x-tUVMFY). 

NAME: Putative AMP-binding domain signature. 

CONSENSUS: [UVMFY]-x(2)-[STG]-[STAGJ-G-[STl-lSTEIl-[SG]-x-r7ASLIVMl-[KR3. 

NAME: Molybdenum cofactor biosynthesis proteins signature 1 . 
CONSENSUS: [LIVM](3HUT](2)-G-G-T-G-x(4)-D. 
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NAME: Molybdenum cofactor biosynthesis proteins signature 2. 

CONSENSUS: S-x-{GS]-x(2)-D-x(5)-[LIVW)-x(10J2)-n-m-x(2)-[KR]-P.G-CKRL]-P-x(2)- 
CONSENSUS: (UVMFJ-IGAJ. 

NAME: moaA / nifB / pqqE family signature. 

CONSENSUS: [UVl-x(3)-C-[NP]-tLIVMF]-[QRS]-C.x-[FYMl-C. 

NAME: Radical activating enzymes signature. 

CONSENSUS : [GV]-x-G-x-[KR]-x<3 )-F-x(2)-G-x«). 1 )-C-*(3)-C-x(2)-C-x.[NL] . 

NAME: Tpx family signature. 

CONSENSUS: S-x-D-L-P-F-A-x(2MKRHFW]-C. 

NAME: Cytochrome c family heme-binding site signature. 
CONSENSUS: C-{CPWHF}-{CPWR}-C-H-{CFYW}. 

NAME: Cytochrome b5 family, heme-binding domain signature. 
CONSENSUS: [FYMUVMK]-x(2)-H-P-[GA]-G. 

NAME: Cytochrome b/b6 heme-ligand signature. 

CONSENSUS: [DENQ]-x(3)-G-[FYWMQ]-x-[LIVMF]-R-x(2)-H. 

NAME: Cytochrome b/b6 Qo site signature. 
CONSENSUS: P-[DE]-W-[FY]-[LFY]<2). 

NAME: Cytochrome b559 subunits heme-binding site signature. 
CONSENSUS: [UV>x-[STHLIvn-R-|TYW]-xW 

NAME: Nickel-dependent hydrogenases b-type cytochrome subunit signature 1 . 
CONSENSUS: R-[UVMFW>x-H-W-[LIVM>x(2)-[U^ 

NAME: Nickel-dependent hydrogenases b-type cytochrome subunit signature 2. 

CONSENSUS: [RH]-[STA]4LIVMFYW]-H-ERHl-[UVMl-x(2)-W.x-[LIVMFl-x(2).F-x(3)-H. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 1. 

CONSENSUS: R-P-[LIVMT]-x(3)-[LIVMl-x(6)-[LIVMWPKl-x(4).S-x(2)-H.R-x-[ST]. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 2. 

CONSENSUS: H-x(3)-lGA]-[LIVMT]-R-[HFMLIVMF]-x-[FYWMl-D-x-[GVAl. 

NAME: Thioredoxin family active site. 

CONSENSUS: [LIVMin-[UVMSTA]-x-[LIVMFYCHFYW^^ 
CONSENSUS: [PHYWSTAl-C-x(6)-(UVMFYWT]. 

NAME: Glutaredoxin active site. 

CONSENSUS: [LIVD].[FYSA]-x(4)-C.IPV]-[FYWl-C-x(2).CTAV].x(2.3)-{LIV]. 
NAME: Type-1 copper (blue) proteins signature. 

CONSENSUS: [GA]-x(0,2MYSA]-x(0, t)-[VFY]-x-C-x( 1 , 2)-[PGl-x(0, l)-H-x(2,4>-[MQ). 

NAME: 2Fe-2S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-{C}-{C}-[GA]-{C}-C-[GAST]-{CPDEKRHFyW}-C. 

NAME: Adrenodoxin family, iron-sulfur binding region signature. 
CONSENSUS: C-x(2)-[STAQj-x-[STAMV]-C-[STAl-T-C-[HR]. 

NAME: 4Fe-4S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C«x(2)-C-x(2)-C-x(3)«C-[PEG]. 

NAME: High potential iron-sulfur proteins signature. 
CONSENSUS: C-x(6,9HUVM]-x(3)-G-[YWK-x(2MFYW]. 

NAME: Rieske iron-sulfur protein signature 1 . 
CONSENSUS: C-[TK1-H-L-G-C-[LIVT1. 

NAME: Rieske iron-sulfur protein signature 2. 
CONSENSUS: C-P-C-H-x-lGSA]. 

NAME: Flavodoxin signature. 

CONSENSUS: [UV]-[UVFmFY>x4STJ-x(2MAGa-x-^^ 

NAME: . Rubredoxin signature. 

CONSENSUS: [UVM]-x(3)-W-x-C-P-x-C-{AGD]. 
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NAME: Electron transfer flavoprotein alpha-subunit signature. 
CONSENSUS: [U]-Y-(LIVMHAT]-x4HI\n-[SD)-G-x^ 
CONSENSUS: [IV]-N. 

NAME: Electron transfer flavoprotein bcta-subunit signature. 

CONSENSUS: [IVA]-x-[KR]-x(2HDEMGDHGDE]-x< I ,2MEQ]-x-[LIV]-x(4VP-x-[Lrv*M]<2)- 
CONSENSUS: [TAC]. 

NAME: Vertebrate metaUothioneins signature. 

CONSENSUS: C-x-C-[GSTAP]-x(2K-x-C-x(2)-C-x-C-x(2)-C.x-K. 
NAME: Ferritin iron-binding regions signature 1. 

CONSENSUS: E-x-[KR]-E-x(2>E-[KR]-[LF]-[UVMA]-x(2)-Q-N-x-R-x-G-R. 
NAME: Ferritin iron-binding regions signature 2. 

CONSENSUS: r>x(2)-tLIVMF][STAq-[DH]-F-[LIl-[E^-x(2)-tFYl-L-x(6)-[LIVM].[KN]. 
NAME: Bacterioferritin signature. 

CONSENSUS: < M-x^}-x(3)-V-[LIVl.x(2).[LM]-x(3)-L-x(3).L. 
NAME: Transferrins signature 1. 

CONSENSUS: Y-x(0, 1 )-[VAS]-V-[IVAC]-[IV A)-[I V A]-(RKH1-[RKS)-[GDENSA]. 
NAME: Transferrins signature 2. 

CONSENSUS: Y-x<5.A.fJ^]-[KRHNQ]-C-L-x(3,4)-G-[DENQ].V4GAl-[FYWl. 
NAME: Transferrins signature 3. 

CONSENSUS: [DENQ]-tYFl-x.[LYl-L-C-x.[DNl-x(5,8)-[LIV]-x(4,5K-x(2)-A.x(4)-tHQR]-x. 
CONSENSUS: [LIVMFYWMLIVM] • 

NAME: Globins profile. 

NAME: Protozoan/cyanobacterial globins signature. 

CONSENSUS: F-[LF]-x(5)-G-[PAl-x(4)-G-[KRA]-x-tLIVM]-x(3).H . 

NAME: Plant hemoglobins signature. 
CONSENSUS: [SN)-P-x-L-x(2)-H-A-x(3)-F. 

NAME: Hemerythrins signature. 
CONSENSUS: W-L-x-[NQ)-H-I-x(3)-D-F. 

NAME: Arthropod hemocyanins / insect LSPs signature 1 . 
CONSENSUS: Y-[FYW]-x-E-D-[LIVM]-x(2)-N-x(6)-H-x(3)-P. 

NAME: Arthropod hemocyanins / insect LSPs signature 2. 
CONSENSUS: T-x(2)-R-D-P-x-fIT]-[FYW]. 

NAME: Heavy-metal-associated domain. 

CONSENSUS: [LIVN]-x(2)-[LIVMFA].x^.x-{STAGCDNHl-C-x(3V[UVFG]-x(3)-(LIV]-x(9, 1 1)- 
CONSENSUS: [IV A]-x-[LVFYSl . 

NAME: ABC transporters family signature. 

CONSENSUS: |UVMFYC]-[SA1-[SAPGLVFYKQH]-G-[DENQMW)-[KRQASPCLIMFW]-[KRNQSTAVM]- 
CONSENSUS: [KRACLVMl-rLIVMFrT>A>n-{PHY}-^^ 
CONSENSUS: [LTVMFYWSTA] . 

NAME: Binding-protein-dependent transport systems inner membrane comp. sign. 

CONSENSUS: [LIVMr^]-x(8)-[EQR]-[STAGVl-(STAGl-x(3)-G-ILIVMFYSTAC]-x(5)-[LIVMFYSTAl- 
CONSENSUS: x(4)-[UVMFY]-[PKR]. 

NAME: ABC-2 type transport system integral membrane proteins signature. 
CONSENSUS: [LrMST]-x(2)-[UMW]-x<2)-[LlMCAHG^ 
CONSENSUS: x(9J2>P-[LIMFT]-x-[HRSYl-x(5)-CRQ3. 

NAME: Bacterial extracellular solute-binding proteins, family 1 signature. 
CONSENSUS: (GAF]-{UVMFAHSTAVDN]-x(4MGSA^[I^ 
CONSENSUS: [KNDEJ. 

NAME: Bacterial extracellular solute-binding proteins, family 3 signature. 
CONSENSUS: G-{FraMDEHUV\n>^[UVM^ 

NAME* Bacterial extracellular solute-binding proteins, family 5 signature. 
CONSENSUS: (AG]-x(6,7MDNEG]-x(2MSTAVE]-[LTVMFr^^ 
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CONSENSUS: (KRHDE]-[GDN1-[UVMA)-[KNGSP]-[FW1. 
NAME: Scrum albumin family signature. 

CONSENSUS: (FYl-x(6)-C-C.x(7)-C-[LFY]-x(6)-[LIVMFfW3. 
NAME: Transthyretin signature 1. 

CONSENSUS: S-K~C-P-L-M-V-K-V-L-D-[AS]-V-R-G. 
NAME: Transthyretin signature 2. 

CONSENSUS: S-P-[FY]-S-[FY]-S-T-T-A-{LlVM]-V-[ST]-x-P. 
NAME: Avidin / Streptavtdin family signature. 

CONSENSUS: [DEN]-x(2HKR]-[STA]-x(2)-V-G-x-[DN].x-[FW]-T-[KR]. 

NAME: Eukaryotic coba lam in- binding proteins signature. 
CONSENSUS: [SN]-V-D-T-[GA].A-[LIVM]-A-x-L-A-tLIVMF)-T-C. 

NAME: LipocaJin signature. 

CONSENSUS: [DENG]-x-[DEN(^STARK]-x(0,2)-[DENQARK]-[LWFY]-{CP}<}-{C}-W-rFYWLRH].x- 
CONSENSUS: [LIVMTAJ. 

NAME: Cytosolic fatty-acid binding proteins signature. 

CONSENSUS: [GSAIVK]-x-[FYW]-x.(LIVMn-x(4)-lNHG]-[r^-[DE]-x-[LrVMFyi-JUVM]-x(2^ 
CONSENSUS: [LIVMAKR]. 

NAME: Acyl-CoA-binding protein signature. 

CONSENSUS: P-[STAJ-x-pEhq-x-[LIVMFH(2MLIVMFn^ 
NAME: LBP / BPI / CETP family signature. 

CONSENSUS: [PA]-[GA]-fLIVMC].x(2)-R-tIV].[ST]-x(3).L.x(5>-fEQ]-x(4)-(LIVM]-[EQK]. 
CONSENSUS: x(8)-P. 

NAME: Phosphatidylethanolamine -binding protein family signature. 
CONSENSUS: [FYl-x-[UVMFl(3)-x-[DC]-P-D-x-P-[SN]-x(l0)-H. 

NAME: Plant lipid transfer proteins signature. 

CONSENSUS: [UVMMPAJ-x(2K-x-[UVM]-x-[UVM]-x-[LW^ 
CONSENSUS: [DN]-C-x(2)-[LIVM]. 

NAME: Uteroglobin family signature 1 . 

CONSENSUS: [GA]-x(3VI-C.P-x-[LIVMF]-x(3)-[LIVMl.[DE]-x-[LIVMF](2). 
NAME: Uteroglobin family signature 2. 

CONSENSUS: (DEQ]-x(4)-[SN]-x(5).pEQ].x.I-x(2)-S-[PSE]-[LS]-C. 

NAME: Mitochondrial energy transfer proteins signature. 

CONSENSUS: P-x-[DEl-x.HJVAT]-lRK]-x-[LRH]-[LrsrMFY]-tQMAIGV). 

NAME: Sugar transport proteins signature 1 . 

CONSENSUS: [LIVMSTAGl-lUVMFSAG]-x(2)-fLIVMSAJ-[DE]-x-[LIVMnrWA]-G-R-[RK]-x(4,6)- 
CONSENSUS: [GSTA]. 

NAME: Sugar transport proteins signature 2. 

CONSENSUS: (LIVMF].x<i-[UVMFA]-x(2)-G-x(8)-[LlFY]-x(2)-lEQ]-x(6)-fRK]. 

NAME: LacY family proton/sugar symporters signature 1. 

CONSENSUS: G-[UVM](2)-x-D-[RK]-L-G-L-[RKl(2)-x-(LIVM3(2)-W. 

NAME: LacY family proton/sugar symporters signature 2. 

CONSENSUS: P-x-[LIVMFl(2>N-R-(LIVMl-G-x.K-N-tSTA]-[LIVM](3). 

NAME: PTR2 family proton/oligopeptide symporters signature I. 
CONSENSUS: (GAHGAS]-[UVMFYWA]-[LIVMHGAS]^ 
CONSENSUS: [IV]-x(3HGSTAV]-x-[LIVMF]-x(3MGA]. 

NAME: PTR2 family proton/oligopeptide symporters signature 2. 
CONSENSUS: [FH>x(2)-[UrfFYHFYV]-[UVMF¥TVA]-x-^ 

NAME: Amiloride-sensitive sodium channels signature. 
CONSENSUS: Y-xaMEQTF]-x^-x<2MGSTDNLK-x-(Q^xa^ 

NAME: Sodium: alanine symporter family signature. 

CONSENSUS: G-G-x-[GA](2>-[LIVM]-F-W.M-W-lUVMl-x-(STAV3.tLIVMFAJ(2)-G. 
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NAME: Sodiurmdicarboxylate symporter family signature 1. 

CONSENSUS: P-x(0,l)-G-[DEJ-x-[UVMF)(2)-x-[LIVM](2HKREQj-{LIVM3(3)-x.p. 
NAME: Sodium:dicarboxylate symporter family signature 2. 

CONSENSUS: p.x^-x-[STAl.x4NT).[UVMC}-D^-[STANl-x-[LIVM].[FY]-x(2).[LIVM3-x(2). 
CONSENSUS: ILIVM]-[FY]-[LI]-[SA]-Q. 

NAME: Sodium: galactoside symporter family signature. 

CONSENSUS: r>x(3)-G-x(3V[D>n-x(6 > 8>-G-fKHl-F-[KR]-P-[FYW3.[LIVM](2)-x-tGSTA](2). 

NAME: Sodiumrneurotransmitter symporter family signature !. 
CONSENSUS: W-R-F-[GP]-Y-x(4)-N-G-G-G-x-[FY]. 

NAME: Sodium: neurotransmitter symporter family signature 2. 
CONSENSUS: Y-[LIVMr^-x<2MSC]-rLIVMFYHS^ 

NAME: Sodium:solute symporter family signature 1. 
CONSENSUS: [GS]-x(2HLlY]-x(3)-[UVMF™STAG](W 
CONSENSUS: [SAP]. 

NAME: Sodium: solute symporter family signature 2. 

CONSENSUS: [GASTl-rLfVMl.x(3)-[KR]-x(4)-G-A-x(2>-lGASl-[LIVMGS)-[LIVMWl.(LIVMGAT)-G. 
CONSENSUS: x-[UVMG]. 

NAME: Sodium: sulfate symporter family signature. 

CONSENSUS: [STACP]-S-x(2)-F-x(2)-P-[LIVM].[GSA]-x(3>-N-x-CUVM].V. 

NAME: glpT family of transporters signature. 
CONSENSUS: R-G-x(5)-W-N-x(2)-H-N-x-G-G. 

NAME: Ammonium transporters signature. 

CONSENSUS: D-[FYWS]-A-G-[GSC]-x(2)-{IV)-x(3)-[SAGl(2)-x(2)-[SAG3-[LIVMF3-x(3)- 
CONSENSUS: [LIVMFYWA](2)-x-[GKJ-x-R. 

NAME: BCCT family of transporters signature. 
CONSENSUS: [GSDr^-W-T-[LIVM]-x-[FYl-W-x-W-W. 

NAME: Flagellar motor protein motA family signature. 
CONSENSUS: A-[LMF]-x-[GATl-T-tLIVF3.x-G-x.[LIVMF]-xa)-P. 

NAME: Formate and nitrite transporters signature 1. 

CONSENSUS: [UVMA]-[UVMY]-x-G-[GSTA]-[DES]-L-[FI]-[TNl-[GSl. 

NAME: Formate and nitrite transporters signature 2. 
CONSENSUS: [GA]-x(2)-[CA]-N-[LrVMFYW](2)-V.C-[LV].A. 

NAME: Prokaryonc sulfate-binding proteins signature 1. 
CONSENSUS: K-x-fNQEKHG^G-[DQ]-x-[LiVM>x<3)^S. 

NAME: Prokaryonc sulfate-binding proteins signature 2. 
CONSENSUS: N-P-K-[ST]-S-G-x-A-R. 

NAME: Sulfate transporters signature. 

CONSENSUS: P-x-Y-[GS]-L-Y-[STAGl(2)-x(4)-[LIVMFYl(3)-x(3)-[GSTAK2)-S-[KR]. 
NAME: Amino acid permeases signature. 

CONSENSUS: lSTAGq-G-[PAG]*x(2,3)-[UVMFYWAK2)-x-(LIVMFYW]-x-[LIVMFWSTAGC](2>- 
CONSENSUS: [STAGq-x(3)-[LIVMFYW].x-[UVMST]-x(3>-[LIMCTAl-[GA].E.x(5)-[PSAL]. 

NAME: Aromatic amino acids permeases signature. 

CONSENSUS: I-G-[GA]-G-M-[LF]-[SA)-x-P-x(3)-[SA]-G-x(2)-F. 

NAME: Xanthine/uracil permeases family signature. 
CONSENSUS: [UVNq-P-x-rPASIF]-V-[U^ 

NAME: Anion exchangers family signature I . 

CONSENSUS: F-G-G-{LIVM3(2HKR)-D-{LIVM1-[RK]-R-R.Y. 

NAME: Anion exchangers family signature 2. 
CONSENSUS: [Fn-L-l-S-L-I-F-I-Y-E-T-F-x-K-L. 

NAME: MIP family signature. 

CONSENSUS: rHNQAJ-x.N'P-[STA]-[LIVMF]-[STl-(LIVMF]-[GSTAFY). 
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NAME: General diffusion Gram-negative porins signature. 

CONSENSUS: [LIVMI^l-x(2)-G-x(2)-Y-x.F-x.K-x(2)-[SN]-[STAV].[LIVMFYWl-V. 
NAME: OmpA-like domain. 

CONSENSUS: [LIVMA]-x-[GT]-x.[TA]-[DA]-x(2)-[DG]-[GSTP]-x(2HLFYDE].[NQS].x(2)- 
CONSENSUS: [U]-[SG].[QE]-[KRQE]-R-A-x(2V[LVl.x(3>-{LrVMFl-x(4,5).[LIVM]-x(4>- 
CONSENSUS: [UVM]-x<3)-{SG]-x-G. 

NAME: Eukaryotic mitochondrial porin signature. 

CONSENSUS: [YH].x(2>-D-[SPAl-x-[STA]-x(3)-tTAG]-[KR)-[LIVMF3.[DNSTAl-[DNS]-x(4)- 
CONSENSUS: [GSTAN}-fLIVMA)-x-[LTVMY]. 

NAME: Insulin-like growth factor binding proteins signature. 
CONSENSUS: G-C-{GSK-C-x(2)-C-A-x(6)-C. 

NAME: GPR1 /FUN34/yaaH family signature. 
CONSENSUS: N-P-{AV)-P-[LF]-G-L-x-[GSA]-F. 

NAME: GNS1/SUR4 family signature. 
CONSENSUS: L-x-F-L-H-x-Y-H-H. 

NAME: 43 Kd postsynaptic protein signature. 
CONSENSUS: G-Q-D-Q-T-K-Q-Q-I. 

NAME: Actins signature 1. 

CONSENSUS: [FY]-[UV]-G-[DE]-E-A-Q-x-[RKQJ(2VG. 
NAME: Actins signature 2. 

CONSENSUS: W-[IV]-[STA]-[RK]-x-[DE]-Y-[DNE]-[DE]. 
NAME: Actins and actin- related proteins signature. 

CONSENSUS: [U^]-[LIVMl-T-E-[GAPQ]-x-[LIVMFYWHQl-N-{PSTAQl-x(2)-N-[KR]. 
NAME: Annexins repeated domain signature. 

CONSENSUS: [TG]-[STV]-x(8)-[LIVMF]-x(2)-R-x(3)-[DEQNHl-x(7)-[IFY]-x(7)-[LIVMF]- 
CONSENSUS: x(3)-[LIVMF]-x(l l)-[LIVMFA]-x(2MLIVMF]. 

NAME: Caveolins signature. 
CONSENSUS: F-E-D-V-I-A-E-P. 

NAME: Clathrin light chain signature 1. 
CONSENSUS: F-L-A-Q-Q-E-S. 

NAME: Clathrin light chain signature 2. 

CONSENSUS: [KR]-D-x-S-fKR]-[LIVM]-[KR]-x-ILIVMl(3)-x-L-K. 

NAME: Clusterin signature 1. 
CONSENSUS: C-K-P-C-L-K-x-T-C. 

NAME: Clusterin signature 2. 

CONSENSUS: C-L-[RK]-M-[RK]-x-[EQ]-C-[ED]-K-C. 
NAME: Connexins signature 1 . 

CONSENSUS: C-[DN]-T-x-Q-P-G-C-x(2)-V-C-Y-D. 
NAME: Connexins signature 2. 

CONSENSUS: C-x(3,4^P-C-x(3)-[LIVM]-fDEN]-C-[FY]-[UVMl-[SA]-[KR]-P. 
NAME: Crystallins beta and gamma 'Greek key' motif signature. 

CONSENSUS: [LIVMr^A]-x-{DEHRKSTP}-[FY]-[DEQHKYl-x(3)-[Fr1-x-G-x(4)-(UVM 
NAME: Dynamin family signature. 

CONSENSUS: L-P-[RK]-G-[STN]-[GN]-[LIVM]-V-T-R. 

NAME: Dynein tight chain type 1 signature. 

CONSENSUS: H-x-I-x-G-[KR]-x-F-{GA}-S-x-V-[STl-[HY]-E. 

NAME: FtsZ protein signature 1. 

CONSENSUS: N-[ST]-D-x-Q-x-L-x(16J8>-C^x^(ATV]-G-[GSANl-x-P-x(2)-G. 

NAME: FtsZ protein signature 2. 

CONSENSUS: [DNHKR]-[LIYTtfH-x-{LIVMina)^ 

CONSENSUS: [GSAR]-[STA]-P-[UVMFT]-rLIVMFl-[SGA V] . 
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NAME: Fungal hydrophobic signature. 

CONSENSUS: [GN]-[DNQPSA]-x<:-tGSTANK]-[GSTADNQ]-[STNQn.lPTIV).x.C-C.[DENQKPST]. 

NAME: Intermediate filaments signature. 

CONSENSUS: [IV].x-lTACn-Y-[RKH]-x-[LM]-L-[DE). 

NAME: Involucrin signature. 

CONSENSUS: <M-S-[QHW-x-T-[LV]-P-V-T-£LV]. 
NAME: Kinesin motor domain signature. 

CONSENSUS: [GSA)-[KRHPSTQVM]-[l.rVMFl-x.[LIVMFl-[IVCl-D-L-[AHl-G-[SAN]-E. 
NAME: Kinesin motor domain profile. 
NAME: Kinesin light chain repeat. 

CONSENSUS: [DEQR]-A-L-x(3)-[GEQ]-x(3)-G-x-[DNS]-x-P-x-V.A.x(3)-N-x.L-[AS]- 
CONSENSUS: x(5)-[QRJ-x-[KR].[FY]-x(2>-[AV]-x(4)-[HKNQl. 

NAME: Myelin basic protein signature. 
CONSENSUS: V-V-H-F-F-K-N. 

NAME: Myelin PO protein signature. 

CONSENSUS: S-{KR]-S-x-K-[AG]-x-[SA]-E-K-K-(STA]-K. 

NAME: Myelin proteolipid protein signature 1 . 
CONSENSUS: G-[MV]-A-L-F-C-G-C-G«H. 

NAME: Myelin proteolipid protein signature 2. 

CONSENSUS: C-x-[ST]-x-[DE]-x(3)-[ST3-tFY]-x.L-[FYl-I-x(4)-G-A. 

NAME: Neuromodulin (GAP-43) signature I. 
CONSENSUS: < M-L-C-C-[LIVM]-R-R. 

NAME: Neuromodulin (GAP-43) signature 2. 
CONSENSUS: S-F-R-G-H-I-x-R-K-K-[UVM]. 

NAME: Osteopontin signature. 

CONSENSUS: [KQl-x-[TA]-x(2MGA)-S-S-E-E-K. 

NAME: Peripherin / rom-1 signature. 

CONSENSUS: D-[GS]-V-P«F-[ST]-C-C-N-P-x-S-P-R-P-C. 

NAME: ProfiJin signature. 

CONSENSUS: < x(0, 1HSTA1-x(0, l)-W-[DENQH1-x-[YIH-iDEQ]. 

NAME: Surfactant associated polypeptide SP-C palraitoylation sites. 
CONSENSUS: I-P-C-C-P-V . 

NAME: Synapsins signature 1. 
CONSENSUS: L-R-R-R-L-S-D-S. 

NAME: Synapsins signature 2. 
CONSENSUS: G-H-A-H-S-G-M-G-K-V-K. 

NAME: Synaptobrevin signature. 

CONSENSUS: N-[UVM]-[DENS].[KL]-V-x-PEQ]-R-x(2)-[KR]-[LIVM]-[STDEl-x-[L 
CONSENSUS: [KR]-(TA]-{DE]. 

NAME: Synaptophysin / synaptoporin signature. 
CONSENSUS: L-S-V-(DE]-C-x-N«K-T. 

NAME: Tropomyosins signature. 
CONSENSUS: L-K-E-A-E-x-R-A-E. 

NAME: Tubulin subunits alpha, beta, and gamma signature. 

CONSENSUS: [SAG]-G-G-T-G-[SA]-G. , 

NAME: Tubuluvbeta mRNA autoregulation signal. 
CONSENSUS: < M-R-[DE]-[IL]. 

NAME: Tau and MAP proteins tubulin-binding domain signature. 
CONSENSUS: G-S«x(2>-N-x<2)-H-x-[PAMAG]-G<2). 

NAME: Neuraxin and MAP1B proteins repeated region signature. 
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CONSENSUS: [STAGDNl-Y-x-Y-E-x(2)-(DE]-[KR]-[STACX:iJ. 

NAME: F-actin capping protein alpha subunit signature 1 . 
CONSENSUS: V.H-[FY](2)-E-D-G-N-V. 

NAME: F-actin capping protein alpha subunit signature 2. 
CONSENSUS: F-K-[AE]-L-R-R-x-L-P. 

NAME: F-actin capping protein beta subunit signature. 
CONSENSUS: C-D-Y-N-R-D. 

NAME: Vinculin family talin-binding region signature. 

CONSENSUS: (KR]-x-(LIVMFl-x(3)-{LIVMA]-x(2)-fLIVM]-x(6)-R-Q-Q-E-L. 

NAME: Vinculin repeated domain signature. 
CONSENSUS: [UVM]-x-[QA]-A-x(2>-W-[IL]-x-[DN3-P. 

NAME: Amyloidogenic glycoprotein extracellular domain signature. 
CONSENSUS: G-[VT]-E-[FY]-V-C-C-P. 

NAME: Amyloidogenic glycoprotein intracellular domain signature. 
CONSENSUS: G-Y-E-N-P-T-Y-[KR]. 

NAME: Cadherins extracellular repeated domain signature. 
CONSENSUS: [LIV]-x-[LIVJ-x-D-x-N-D-[NH]-x-P. 

NAME: Insect cuticle proteins signature. 

CONSENSUS: G-x(7)-[DEN]-G-x(6)-Y-x-A-IDNG]-x(2 > 3)-G.[FYl-x-[AP]. 

NAME: Gas vesicles protein GVPa signature 1 . 
CONSENSUS: ru\^- x .(DEMUVMI^-[LIVW 

NAME: Gas vesicles protein GVPa signature 2. 

CONSENSUS: R.[UVA](3)-A-[GS]-[LrVMFYl-x-T-x(3)-Y-lAG]. 

NAME: Gas vesicles protein GVPc repeated domain signature. 
CONSENSUS: F-L-x(2>T-x(3)-R-x(3)-A-x(2)-Q-x(3>-L-x(2)-F. 

NAME: Bacterial microcompartiments proteins signature. 

CONSENSUS: D-x(0,l)-M-x-K-[SAG](2).x-f!Vl-x-|UVMMUVMA)-[GCSJ-x(4)-[GD]-[SGPD]- 
CONSENSUS: [GA]. 

NAME: Flagella basal body rod proteins signature. 

CONSENSUS: [GTARYQJ-x(9)-[LIVMYSTA](2)-[GSTA3-tSTADEN]-N-[LIVMl-[SAN]-N-x-lSADNFR]. 
CONSENSUS: (STV]. 

NAME: Flagella transport protein fliP family signature 1 . 
CONSENSUS: [PA]-A-[Fn-x-[LI>m-[STO]-^ 

NAME: Flagella transport protein fliP family signature 2. 
CONSENSUS: P-[UVMF)-K-[LIVMF](5)-x-[LIVMAMDNGS]-G-W. 

NAME: Plant viruses icosahedral capsid proteins 'S' region signarure. 

CONSENSUS: n 7 YW]-x-lPSTA]-x(7>-G-x-[LIVM]-x-[LlVM]-x-[FYWI]-x(2)-D.x(5).P. 

NAME: Potexviruses and caiiavimses coat protein signature. 

CONSENSUS: [RK]-[FYWl-A-tGAP]-F-D-x-F-x(2HLV]-x(3)-[GAST]a). 

NAME: Neurotransmiuc r-gated ion-channels signature. 
CONSENSUS: C-x-[LIVMFQ]-x-[LIVMF]-xa)-{FYl-P-x-D-x(3)-C. 

NAME: ATP P2X receptors signature. 

CONSENSUS: G^-x-{UVM]^-[LIVT4]-x-(T\n-x-W-x<:-[Dhn-L-r>x(5H:-x-P-x-Y-x-F. 
NAME: G-protein coupled receptors signature. 

CONSENSUS: [GSTALINMFmCHGSTANCPDEMEDPKRHM^ 
CONSENSUS: [GSTANC]-[UVMFmSTAC]-[DEhTH]-R-[r^CSr^ 

NAME: G-protein coupled receptors family 2 signature 1 . 

CONSENSUS: C-x(3)-[rWU>0-D-x(3.4)-C-[r^x(2HSTAGV]-x(8.9H:-[PFl. 

NAME: G-protein coupled receptors family 2 signature 2. 
CONSENSUS: Q^[IJrfFCA]-[UVMFn-[UV]-x-[UVF^ 
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NAME: G-protcin coupled receptors family 3 signature 1. 

CONSENSUS: [LV]-^N.{L!VM](2)-x-L-F-x-I-[PA]-Q-[LIVMl-[STAl-x-[STA](3)>[STAN]. 
NAME: G- protein coupled receptors family 3 signature 2. 

CONSENSUS: C<:-[FYW]-x<:-x(2K-x(4>[FY^-x(2.4)-PN]-x(2)-[STAH]-C-x(2)-C. 

NAME: G-protein coupled receptors family 3 signature 3. 
CONSENSUS: F-N-E-{STA]-K-x-HSTAG]-F-[ST]-M. 

NAME: Visual pigments (opsins) retinal binding site. 

CONSENSUS: [UVMWAC]-[PGAq-x(3)-[SAC]-K-[STALIMR]-[GSACPNV]-[STACP]-x(2)-{DENF]. 
CONSENSUS: [AP]-x(2)-[IY]. 

NAME: Bacterial rhodopsins signature 1. 

CONSENSUS: R-Y-x-pT]-W-x.[UVMF]-(ST].T-P-[LIVM](3). 
NAME: Bacterial rhodopsins retinal binding site. 

CONSENSUS: [FYIV]-x-[FYVG]-[UVM]-D-[LIVMF]-x-{STA].K-x(2>.[FY]. 

NAME: Receptor tyrosine kinase class II signature. 
CONSENSUS: [DN]-[UV]-Y-x(3)-Y-Y-R. 

NAME: Receptor tyrosine kinase class III signature. 
CONSENSUS : G-x-H-x-N-[LIVM]- V-N-L-L-G-A-C-T. 

NAME: Receptor tyrosine kinase class V signature 1. 
CONSENSUS: F-x-[Dr^x-[GAWMGAK-[LIVMMSAML^ 
CONSENSUS: x(3>(KR]-C-[PSAWl. 

NAME: Receptor tyrosine kinase class V signature 2. 
CONSENSUS: C-xttMDEl^MDEQJ-W-xa^HPAQMLIV^ 
CONSENSUS: [EQJ. 

NAME: Growth factor and cytokines receptors family signature 1. 
CONSENSUS: C-[LVFYR]-x(7,8MSTIVDN]-C-x-W. 

NAME: Growth factor and cytokines receptors family signature 2. 
CONSENSUS: [STGL]-x-W-[SG]-x-W-S. 

NAME: TNFR/NGFR family cysteine-rich region signature. 

CONSENSUS: C-x(4,6)-[FYHl-x(5,10K:-x(0,2)-C-x(2,3)-C-x(7,llK-x(4,6)-IDNEQSKP]- 
CONSENSUS: x(2)-C. 

NAME: TNFR/NGFR family cysteine-rich region domain. 

NAME: Integrins alpha chain signature. 
CONSENSUS: lFYWS]-[RK]-x-G-F-F-x-R. 

NAME: Integrins beta chain cysteine-rich domain signature. 
CONSENSUS: C-x-[GNQ]-x(1.3)-G-x-C-x-C-x(2)-C-x-C. 

NAME: Natriuretic peptides receptors signature. 
CONSENSUS: G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3>H-W. 

NAME: Photosynthetic reaction center proteins signature. 

CONSENSUS: [NHl-x(4)-P-x-H-x(2HSAG)-x( 1 1MSAGC]-X-H-[SAG](2). 

NAME: Antenna complexes alpha subunits signature. 

CONSENSUS: [LIVFAG]-x-[GASVl-[LIVFAl-x-tIV]-H-x(3HLIVMHGSTAE]-[STANH]-x( 1 ,3> 
CONSENSUS: (STN]-W-[LIVMFYW1 . 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: [EQ]-x(4)-H-x(5HGSTA]-x(3)-[FYl-x(3)-[AG]-x(2HAV]-H-x(7)-P. 

NAME: Photosystem I psaA and psaB proteins signature. 
CONSENSUS: C-D-G-P-G-R-G-G-T-C. 

NAME: Photosystem I psaG and psaK proteins signature. 

CONSENSUS: G-F-x-(UVM]-x-[DEA]-x(2)-[GA]-x-[GTA]-[SA]-x^-H-x-rLIVMl-{GA]. 

NAME: Phytochrome chromophore actachmenx site signature. 
CONSENSUS: [RGSHGSA|-|PV]-H-x-C-H-x(2)-Y. 

NAME: Phytochrome chromophore attachment site domain profile. 
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NAME: Speract receptor repeated domain signature. 

CONSENSUS: G-x(5)-G-x(2)-E-x(6)-W<;-x(2)-C-x(3)-(FYW)-x(8)-C-x(3)-G. 
NAME: TonB-dependent receptor proteins signature 1 . 

CONSENSUS: < x(10,l 15)-[DENF]-[ST]-[UVMFJ-[LrVSTEQ]-V-x-[AGP]-[STANEQPK]. 
NAME: TonB-dependent receptor proteins signature 2. 

CONSENSUS: ^LYGSTANE]-x(3)-[GSTAENQ^x-[PGE]-R-x-(LIVFYWA]-x-[LrVMFTA]-[STAGNQ]. 
CONSENSUS: [LIVMFYGTA]-x-{LIVMFYWGTADQ]-x-F > . 

NAME: Transmembrane 4 family signature. 

CONSENSUS: G-x(3)-[LIVMFl-x(2)-[GSA]-[LIVMF](2)-G-C-x.{GA].[STAl-x(2).[EGl-x(2)- 
CONSENSUS: [CWN]-[LIVM](2). 

NAME: Bacterial chemotaxis sensory transducers signature. 

CONSENSUS: R.T-E-(EQ].Q-x(2)-[SA]-[LIVM]-x-[EQJ-T-A-A-S-M-E-Q-L-T-A-T-V. 

NAME: ER lumen protein retaining receptor signature 1 . 
CONSENSUS: G-I-S-x-[KR]-x-Q-x-L-[FY]-x-[LIVJ(2)-F-x(2)-R-Y. 

NAME: ER lumen protein retaining receptor signature 2. 
CONSENSUS: L-E-ISA]-V-A-I-[LM1-P-Q-L. 

NAME: Ephrins signature. 

CONSENSUS: [KRQJ-[LFJ.{CST]-x-K-[IF)-Q-x-lFYJ.{ST].[PA]-x(3>.G-x-E.F-x(5)-[FY](2)- 
CONSENSUS: x(2>-{SAJ. 

NAME: Granulins signature. 

CONSENSUS: C-x-D-x(2)-H-C-C-P-x(4)-C. 

NAME: HBGF/FGF family signature. 

CONSENSUS: G-x-L-x-ISTAGP]-x(6,7)-[DE]-C-x-[FM]-x-E-x(6)-Y. 
NAME: PTN/MK heparin-binding protein family signature 1 . 

CONSENSUS: S-[DE]^-x-[DEl-W-x-W-xa>-C-x-P-x.[SNl-x-D-C-G-rLIVMA]<i-x-R-E-G. 
NAME: PTN/MK heparin-binding protein family signature 2. 

CONSENSUS: C-{KR}^UVMJP<:-N-W-K-K-x-F.G-A-[DE]-C-K-Y-x.F-tEQ]-x-W-G-x-C. 

NAME: Nerve growth factor family signature. 

CONSENSUS: G-C-(KR]-G-[LIV]-[DE)-x(3HYW]-x-S-x-C. 

NAME: Platelet-derived growth factor (PDGF) family signature. 
CONSENSUS: P-[PS]-C-V-x(3)-R-C-[GSTA]-G-C-C. 

NAME: Small cytokines (intercrine/chemokine) C-x-C subfamily signature. 

CONSENSUS: C-x^-(LIVMl-x(5,6)-[UVMFY]-x(2)-[RKSEQ].x-[LIVM]-x(2)-[UVMl-x(5)- 

CONSENSUS: (SAG]-x(2)-C-x(3HEQJ-[UVMl(2)-x(9,10)-C-L-[DNl. 

NAME: Small cytokines (intercrine/chemokine) C-C subfamily signature. 

CONSENSUS: C-C-[LIFiT]-x(5 t 6)-[Ln-x(4)-[LIVMr1-x(2)-[FYWl-x(6,8K-x(3,4)-[SAG]- 

CONSENSUS: [UVM](2)-[FL]-x(8)-C-(STAJ. 

NAME: TGF-beta family signature. 

CONSENSUS: [LIVM]-x(2)-P-x(2)-[FY]-x<4>-C-x-G-x-C. 
NAME: TNF family signature. 

CONSENSUS: [LV].x-fLIVM]-x(3)^-[LIVMn-Y-fLr^MFY](2)-x(2)-[QEKHL]-{LIVMGT)-x- 
CONSENSUS: fLIVMFY]. 

NAME: TNF family profile. 

NAME: Wnt-l family signature. 

CONSENSUS: C-K-C-H-G-[UvTViT]-S-G-x-C. 

NAME: Interferon alpha, beta and delta family signature. 

CONSENSUS: (FYH]-[ITl-x-[GNRq-[UVMl-xa>-[FY]-L-x(7)-[CYl-A-W. 

NAME: Granulocyte-macrophage colony-stimulating factor signature. 
CONSENSUS: C-P-[LP]-T-x-E-[ST|-x-C. 

NAME: Interleukin-1 signature. 

CONSENSUS: (Fq-x-S-[ASLV]-x(2>-P.x(2)-{FYLIV].fLri-lSCA]-T-x(7)-[LIVMl. 
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NAME: lnterleukin-2 signature. 

CONSENSUS: T-E-[LF]-x(2)-L-x-C-L-x(2)-E-L. 

NAME: Interleukins -4 and -13 signature. 

CONSENSUS: L-x-E-[LrVMK2)-x(4,5)-[LIVM]-[TL]-x<5 ,7)-C-x(4)-{IVA]-x.[DNS]-[LIVMA]. 

NAME: Interleukin-6 / G-CSF / MGF signature. 
CONSENSUS: C-x(9)-C-x(6)-G-L-x(2)-[FY]-x(3)-L. 

NAME: lmerleukin-7 and -9 signature. 
CONSENSUS: N-x-ILAPJ-lSCTl-F-L-K-x-L-L. 

NAME: Interleukin- 10 family signature. 

CONSENSUS: tGS]<Nx(2HLV]-x<2>[LIVM](2>x-F-Y-L-x(2)-V. 
NAME: LIF / OSM family signature. 

CONSENSUS: [PST].x(4)-F-[NQJ-x-K-x(3K-x.[LFl'L.x(2).Y-(HK]. 

NAME: Macrophage migration inhibitory factor family signature. 
CONSENSUS: [DE]-P-C-A-x(3)-[LIVM]-x-S-I-G-X:[LIVM]-G. 

NAME: Adipokinetic hormone family signature. 
CONSENSUS: <^fL\q-[NTMFYHST]-x(2)-W. 

NAME: Bombesin-Iike peptides family signature. 
CONSENSUS: W-A-x-G-[SH]-[LF]-M. 

NAME: Calcitonin / CGRP / LAPP family signature. 

CONSENSUS: C.[SAGDN]-[STN]-x(OJV[SA].TC-[VMA]-x(3)-[LYF]-x(3)-[LYFl. 

NAME: Corticotropin- releasing factor family signature. 
CONSENSUS: [PQ]-x-[LiVM]-S-[UVM]-x<2)-^ 

NAME: Crustacean CHH/M JH/GIH neurohormones family signarure. 
CONSENSUS: C-tDENK]-D-C-x-N-[LIV].[FYl-R-x(7)-C-[KR]-x(2)-C. 

NAME: Erythropoietin / thrombopoeitin signature. 
CONSENSUS: P-x(4K-D-x-R-[UVM](2>x-[KR]-x(14K. 

NAME: Granins signature 1 . 

CONSENSUS: pEJ-[SN]-L-[SAN]-x(2>-[DE]-x-E-L. 
NAME: Granins signature 2. 

CONSENSUS: C-[UVM](2>-E-(LIVM}(2).S-[DN].tSTA]-L-x-K-x.S-x(3)-[LIVMJ-[STA}-x-E-C. 
NAME: Galanin signature. 

CONSENSUS: G-W-T-L-N-S-A-G-Y-L-L-G-P-H . 

NAME: Gastrin / cholecystolcinin family signature. 
CONSENSUS: Y-x(0, 1MGD]-[WH1-M-IDR]-F. 

NAME: Glucagon / GIP / secretin / VIP family signature. 

CONSENSUS: (YH]-[STAIVGDJ-[DEQ]-[AGF]-[LIVMSTE]-[FYLR]-x-[DENSTAK]-[DENSTA)- 
CONSENSUS: [UVMFYG]-x(9)-[KREQL].(KRDENQL]-[LVFYWG]-[LIVQ3. 

NAME: Glycoprotein hormones alpha chain signature 1 . 
CONSENSUS: C-x-G-C-C-[FY]-S-R-A-[FY]-P-T-P. 

NAME: Glycoprotein hormones alpha chain signature 2. 
CONSENSUS: N-H-Tx-C-x-C-x-T-C-x<2)-H-K. 

NAME: Glycoprotein hormones beta chain signature 1 . 
CONSENSUS: C-[STAGM]-G-[HFYL]-Ox-[ST]. 

NAME: Glycoprotein hormones beta chain signature 2. 

CONSENSUS: {PA]-V-A-x(2K-x-C-x(2)^-x(4)-[STD]-(DEYl-C-x(6 t 8)-[PGSTAVM].x(2)-C. 

NAME: Gonadotropin-re leasing hormones signature. 
CONSENSUS: Q-H-[FYW]-S-x(4)-P-G. 

NAME: Insulin family signature. 

CONSENSUS: C-C-{P}-x(2K:-(STDNEKPIl-x(3>[UVMFS]-x(3)-C. 
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NAME: Natriuretic peptides signature. 
CONSENSUS: C-F-G-x(3)-D-R-I-x(3)-S-x(2)-G-C. 

NAME: Neurohypophysial hormones signature. 
CONSENSUS: C-[UFY](2)-x-N-[CS]-P-x-G. 

NAME: Neuromedin U signature. 
CONSENSUS: F-[LIVMF]-F-R-P-R-N. 

NAME: Endogenous opioids neuropeptides precursors signature. 

CONSENSUS: C-x(3K-x(2)-C-x(2)-[KRH)-x(6 > 7).(LIF]-[DNl-x(3)-C-x.CUVM]-[EQ]-C. 
CONSENSUS: [EQ]-x(8)-W.x(2K. 

NAME: Pancreatic hormone family signature. 

CONSENSUS: (FY]-x(3>-[LIVM]-x(2)-Y-x(3>.[UVMFY]-x-R-x-R-tYF]. 

NAME: Parathyroid hormone family signature. 
CONSENSUS: V-S-E-x-Q-x(2)-H-x(2)-G. 

NAME: Pyrokinins signature. 
CONSENSUS: F-[GSTV]-P-R-L-[G > J. 

NAME: Somatotropin, prolactin and related hormones signature 1 . 

CONSENSUS: C-x-[ST]-x(2)-(LIVMFY]-x-[LIVMSTAl-P-x(5HTALIV]-x(7)-[UVMFY]-x(6)- 
CONSENSUS: lLIVMFY]-x(2MSTAJ-W. 

NAME: Somatotropin, prolactin and related hormones signature 2. 

CONSENSUS: C-[LIVMFY]-x(2)-D-[UVMFYSTAl-x(5)-[LIVMFYl-x(2)-[LIVMFYT]-x(2)-C. 

NAME: Tachykinin family signature. 
CONSENSUS: F-[!VFY]-G-[LM]-M-[G > ). 

NAME: Thymosin beta -4 family signature. 
CONSENSUS: K-L-K-K-T-ETQ-E-K-N. 

NAME: Urotensin II signature. 
CONSENSUS: C-F-W-K-Y-C. 

NAME: Cecropin family signature. 

CONSENSUS: W-x(0,2)-[KDN]-x(2)4C-[KRE]-[LI]-E-{RKN]. 

NAME: Mammalian defensins signature. 
CONSENSUS: C-x-C-x(3,5)-C-x(7)-G-x-C-x(9)-C-C. 

NAME: Arthropod defensins signature. 

CONSENSUS: C-x(2,3)-(HNK-x(3.4)-[GR]-x(2)-G-G'X-C'X(4 t 7).C-x-C. 
NAME: Cathelicidins signature 1 . 

CONSENSUS: Y-x-lED]-x-V-x-[RQ]-A-lLIVMA]-[DQGl-x-[LIVMFY]-N-[EQ]. 
NAME: Cathelicidins signature 2. 

CONSENSUS: F-x-(LIVM]-K.E.T.x-C-x(10)-C-x-F-tKR3-[KEl. 

NAME: Endothelin family signature. 
CONSENSUS: C-x-C-x(4)-D.x(2)-C-x<2MFY]-C. 

NAME: Plant thionins signature. 
CONSENSUS: C-C-x(5)-R-x(2)-[FY]-x(2)-C. 

NAME: Gamma-thionins family signature. 

CONSENSUS: [KR]-x-C-x(3HSV].x(2)-[FYWH]-x.[GF]-x-C-x(5)-C-x(3K. 
NAME: Snake toxins signature. 

CONSENSUS: G-C-x(l,3)-C-P-x(8. lG>C<:-x(2MPDEN]. 
NAME: Myotoxins signaaire. 

CONSENSUS: K-x^-H-x-K-x(2)-H^-x(2VK-x(3K-x(8^^ 
NAME: Scorpion short toxins signature. 

CONSENSUS: C-x(3)-C-x(6.9)-[GAS]-K<:-(IMQT]-x(3)-C-x-C. 

NAME: Heat-stable emerotoxins signature. 
CONSENSUS: C-C-x(2)-C-C-x-P-A-C-x-G-C. 
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NAME: Aerolysin type toxins signature. 
CONSENSUS: [KT)-x(2)-N-W-x(2)-T-[DNl-T. 

NAME: Shiga/ricin ribosomal inactivating toxins active site signature. 

CONSENSUS: lLIVMA].x-[LIVMSTA](2)-x-E-[SAG\n-[STAL]-R-[FY]-[RKNQS]-x-(LFVM]-{EQS]- 
CONSENSUS: x(2)-[IJVMF]. 

NAME: Channel forming colicins signature. 
CONSENSUS: T-x(2)-W-x-P-[LIVMFY](3)-x(2)-E. 

NAME: Hok/gef family cell toxic proteins signature. 

CONSENSUS: (LIVMA](4)-C-{LIVMFA]-T-{LrVMA](2)-x(4)-fLIVM]-x-[RC]-x(2)-L-[CY]. 

NAME: Staphylococcal enterotoxin/Streptococcal pyrogenic exotoxin signature 1 . 
CONSENSUS: Y-G-G-[LIV)-T-x(4)-N. 

NAME: Staphyloccocal enterotoxin/Streptococcal pyrogenic exotoxin signature 2. 
CONSENSUS: K-x(2)-[LIV]-x(4)-[LlV]-D-x(3>-R.x(2VL-x(5)-[LIV]-Y. 

NAME: Thiol-activated cytolysins signature. 
CONSENSUS: [RKJ-E-C-T-G-L-x-W-E-W-W-[RK] . 

NAME: Membrane attack complex components / perforin signature. 
CONSENSUS: Y-x(6)-[FY]-G-T-H-[FY)- 

NAME: Pancreatic trypsin inhibitor (Kunitz) family signature. 
CONSENSUS: F-x(3)-G-C-x(6HFY}-x(5)-C. 

NAME: Bowman-Birk serine protease inhibitors family signature. 

CONSENSUS: C.x(5.6)-[DENQKRHSTA]-C-[PASTDH1-IPASTDKJ4ASTDV]-C-[NDKSMDEKRHSTAJ-C. 

NAME: Kazal serine protease inhibitors family signature. 
CONSENSUS: C-x(7)-C-x(6)-Y-x(3)-C-x(2,3)-C. 

NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature. 
CONSENSUS: IUVM]-ji-D-x.[EDNTY].[DG}-[RKHDENQ]-x.{LIVM].x(5)-Y-x-[LIVM]. 

NAME: Serpins signature. 

CONSENSUS: (UVMr^-x-[LIVMFYACHDNQ]-[RKH(^]-^^ 
CONSENSUS: fUVMFAH]. 

NAME: Potato inhibitor I family signature. 

CONSENSUS: [FYWl-P-[EQHl-rLIV](2)-G-x(2HSTAGVl-x(2)-A. 

NAME: Squash family of serine protease inhibitors signature. 
CONSENSUS: C-P-x(5)-C-x(2)-D-x-D<:-x(3)-C-x-C. 

NAME: Streptomyces subrilis in-type inhibitors signature. 
CONSENSUS: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L. 

NAME: Cysteine proteases inhibitors signature. 
CONSENSUS: [GSTEQKRV>Q-[LIVT]«[VAFMSAGQ]^ 
CONSENSUS: [DENQKRHSIV]. 

NAME: Tissue inhibitors of metalloproteinases signature. 
CONSENSUS: C-x-C-x-P-x-H-P-Q-x-A-F-C. 

NAME: Cereal try psin/alpha -amylase inhibitors family signature. 

CONSENSUS: C-x(4)-[SAGD]-x(4).[SPAL3-[LF]-x(2)-C-[RHl-x-(LIVMFYl(2)-x(3,4H:. 

NAME: Alpha-2-macroglobulin family thiolester region signature. 
CONSENSUS: [PG].x-[GS]-C-[GA}-E-lEQ]-x-[LIVM). 

NAME: Disintegrins signature. 

CONSENSUS: C-x(2VG-x-C^-x.(NQRS|-C-x-tFM]-x(6)<:-tRK]. 

NAME: LambdokJ phages regulatory protein Cm signature. 
CONSENSUS: E-S-x-L-x-R-x(2).(KR]-x-L-x(4>-CKRl(2)-xaHDE].x.L. 

NAME: Chaperonins cpnfiO signature. 
CONSENSUS: A-[AS]-x-[DEQ]-E-x(4KK5-[GA]. 

NAME: Chaperonins cpnlO signature. 

CONSENSUS: fUVT4FY]-x-P-[IL^x-[DENHKRMLi^ 
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CONSENSUS: [LIVMFY](3). 
NAME: Chapcronins TCP-1 signature 1. 

CONSENSUS: [RKEL]-[STl.x-(LMFY]-C-P-x-[GSA]-x-x-K-[LIVMFl(2). 
NAME: Chaperonins TCP-1 signature 2. 

CONSENSUS: {LIVM)-[TS]-lNK]-D-lGA]-[AVNHK]-[TAV]-[LIVM](2)-x(2)-rLIVMl.x.[LIVM]-x- 
CONSENSUS: [SNH]«(PQH]. 

NAME: Chapcronins TCP-1 signature 3. 

CONSENSUS: Q.{DEK]-x-x-[LIVMGTA]-[GA]-D-G-T. 

NAME: Heat shock hsp20 proteins family profile. 

NAME: Heat shock hsp70 proteins family signature 1 . 
CONSENSUS: pV]-D-L-G-T-[ST]-x-{SC]. 

NAME: Heat shock hsp70 proteins family signature 2. 

CONSENSUS: [LIVMF]-[UVMFY]-[DN]-[LIVMFS]-G^GSH]-[GS]-[AST|-x(3)-{STl-[LIVMl- 
CONSENSUS: [LIVMFCJ. 

NAME: Heat shock hsp70 proteins family signature 3. 
CONSENSUS: [UVMY]-x-[LIVMFhx-G-G-x-[ST]-x-^ 

NAME: Heat shock hsp90 proteins family signature. 
CONSENSUS: Y-x-[NQH)-K-[DE]-[IVA]-F-L-R-[ED) . 

NAME: Chaperonins clpA/B signature 1. 

CONSENSUS: D-[AI].[SGA]-N-lLIVMFl(2)-K-[PT|-x-L-x(2)-G. 
NAME: Chaperonins clpA/B signature 2. 

CONSENSUS: R-[LlVMFYJ-D-x-S-E-[LIVMFY).x-E-[KRQl-x-[STA)-x-[STA]-[KRHUVM].x-G- 
CONSENSUS: [STA]. 

NAME: Nt-dnaJ domain signature. 

CONSENSUS: [FY]-x<2)-[UVMA^x(3^[F^^^^^^ 

NAME: dnaJ domain profile. 

NAME: CXXCXGXG dnaJ domain signature. 

CONSENSUS: C-[DEGSTHKR]-x^-x<»-x-lGK]-(AGSDM].x(2).[GSNKR]-x(4.6)-C-x(2,3)-C-x-G-x-G. 
NAME: grpE protein signature. 

CONSENSUS: [I^HDN>[PHEAl-x<2)-[HM]-x-A-[LI\M^ 
CONSENSUS: [LIVMHRIJ-x-lSA]-x-V-x-nV]. 

NAME: Bacterial type II secretion system protein C signature. 
CONSENSUS: P-x(6^F^4>-L-x(3)-r>(LiVMl-A-[LIVN^ 

NAME: Bacterial type II secretion system protein D signature. 

CONSENSUS: [GRJ-[DEQKGl-{STVM].[Lrv^l(3)-[GA]-G.[LIVMFY]-x( 1 1)-[LIVM]-P- 
CONSENSUS: [UVMFWGSHLrVWFMGSAEl-x-tL^^ 

NAME: Bacterial type II secretion system protein E signature. 
CONSENSUS: [LIVM]-R-x(2)-P-r>x-[LIVM](3)-G-E.[LIVM]-R-D. 

NAME: Bacteria] type n secretion system protein F signature. 
CONSENSUS: [KRQ]-[LIVMA)-x<2MSAIVMLIVM)-x-i^ 
CONSENSUS: [LMY]-x(3MLIVMF](2)-P. 

NAME: Bacterial type II secretion system protein N signature. 
CONSENSUS: G-T-L-W-x-G-x( 1 l)~L-x(4)-W. 

NAME: Bacterial export FHIPEP family signature. 

CONSENSUS: R.[LIVM]-[GSA]-E-V-[GSA]-A-R-F-[STV].L-^>[GSA^M-P<5-K^-M-[GSA^I-^ 
CONSENSUS: [GSAJ-D. 

NAME: Protein secA signanires. 

CONSENSUS: [IV].x-{IV]-[SA]-T-[NQJ-M-A-G-R-G-x.D.I.x.L. 
NAME: Protein secY signature 1. 

CONSENSUS: [GST]-[UVMF](2Vx-[UVM]<i-[UV^ 
CONSENSUS: [UVMFAT](3)-Q-fUVMFA](2). 
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NAME: Protein sccY signature 2. 

CONSENSUS: [UVMFYW](2)-x-[DE]-x-[LIVMfl-[^^ 
CONSENSUS: (LIVMF](3). 

NAME: Protein secE/sec61 -gamma signature. 

CONSENSUS: ILIVMFY].x(2)-[DENQGAJ-x(4)-[L1VMTA]-x-[KRV].x(2)-[KW]-P.a(3>-[SEQ3"X(7). 
CONSENSUS: [LIVTMLIVGA]-[LIVFGAST]. 

NAME: Gram-negative pili assembly chaperone signature. 
CONSENSUS: [LIVMFYMAP^-x-(DNS]-[KREQ]-E-[STO^^ 
CONSENSUS: x(2)-[UVM]-P-[PAS]. 

NAME: Fimbrial biogenesis outer membrane usher protein signature. 

CONSENSUS: [VL].[PAS(M-[PAS]-G-tPAD].[FY].x-[Ln-PNQSTAP]-[DNH].[LIVMFYl. 

NAME: SRP54-type proteins G TP-binding domain signature. 
CONSENSUS: P-[UVM]-x-[nrL]-[UVMATHGS]-x^ 

NAME: Cytochrome c oxidase assembly factor COXlO/ctaB/cyoE signature. 
CONSENSUS: [ED]-x-IM(2>-M-x-R-T-x(2)-R-x(4>-G. 

NAME: Cyclin-dependem kinases regulatory subunits signature 1. 

CONSENSUS: Y-S-x-(KRJ.Y-x.(DE)(2).x-[FY]-E-Y-R-H-V-x-[LV3-(PT]-lKRP]. 

NAME: Cyclin-dependent kinases regulatory subunits signature 2. 
CONSENSUS: H-x-P-E-x-H-[IV]-L-L-F-[KR] . 

NAME: Pentaxin famiry signature. 
CONSENSUS: H-x-C«x-[ST]-W-x-{ST]. 

NAME: Immunoglobulins and major histocompatibility complex proteins signature. 
CONSENSUS: [FY]-x-C-x-{VA]-x-H. 

NAME: Prion protein signature 1. 

CONSENSUS: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y. 
NAME: Prion protein signature 2, 

CONSENSUS: E-x-[ED]-x-K-[LIVM](2)-x-[KR]-fUVMl(2)-x-[QEl-M-C-x(2)-Q-Y. 
NAME: Cyclins signature. 

CONSENSUS: R-x(2>[LrVMSA]-x(2)-[FYWSl.[LIVMl-x(8)-[UVMFCl.Ji(4>-[LIVMFYAl-x(2). 
CONSENSUS: [STAGC]-[UVMFYQ]-x-[LIVMFYC]-[LIVMr^-r>[RKH].[UVMFYW]^ 

NAME: Proliferating cell nuclear antigen signature 1. 

CONSENSUS: [GA]-[UVMF]-x-[UVMA].x-[SAV].[LIVM]-D-x-[NSAE].[HKR]-(Vrj-x4LY]- 
CONSENSUS: [YGA]-x-[UVM]-x-(UVM].x(4)-F. 

NAME: Proliferating cell nuclear antigen signature 2. 
CONSENSUS: [RKA]-C-[DEHRH]-x(3)-[LrVT4F] 
CONSENSUS: [UVMF](2). 

NAME: Actin-depolymerizing proteins signature. 
CONSENSUS: P-[DE]-x.[SA]-x-[LIVNrr)-[KR)^^^ 
CONSENSUS: [KR]. 

NAME: BCL2-like apoptosis inhibitors (spans pan of BH3, BH1 and BH2). 

NAME: Apoptosis regulator, Bcl-2 family BH1 domain signature. 
CONSENSUS: [LVME]-[rT>x-[GSDHGL]-x(l,2^^ 
CONSENSUS: [LIVMF](2)-x-F-[GSAEHGSARY]. 

NAME: Apoptosis regulator, Bcl-2 family BH2 domain signature. 
CONSENSUS: W-[LIM]-x(3HGR]-G-{WQ]-[DENSAV]-x.[FLGA]-[UVFTCl. 

NAME: Apoptosis regulator, Bcl-2 family BH3 domain signature. 
CONSENSUS: [LIVAT]-x(3)-L-[KARQ]-xirVAL]-G-^^ 
CONSENSUS: [NSR]. 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain signature. 

CONSENSUS: [DSHrn>R-fAE]-[U]-V-x ^]-[FY]-{LIV]-{GHS}-Y-K-L-lSR]-Q-rRK]-G- 
CONSENSUS: [HY]-x-[CWl. 

NAME: Apoptosis regulator, Bcl-2 famiry BH4 domain profile. 
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NAME: Arrestins signature. 

CONSENSUS: [FY]-R-Y-G-x-[DE](2)-x-[DEHUV^ 

NAME: AAA-protein family signature. 
CONSENSUS: [LIVNrn-x-[UVMTI-[UVMF]^ 
CONSENSUS: x-R. 

NAME: Ubiquitin domain signature. 

CONSENSUS: K-x(2)-[LIVMl-x-[DESAK]-x(3)-[LIVM3-[PA].x(3).Q-x-[LIVM]-[LIVMC]- 
CONSENSUS: [L!VMFY]-x.G-x(4MDE]. 

NAME: Ubiquitin domain profile. 

NAME: ADP-ribosylarion factors family signature. 

CONSENSUS: [HRQT)-x-[FYWI]-x-[LIVM].x(4)-A-x(2)-G-x(2).[LIVMl.x(2)-[GSA]-[LIVMF]-x- 
CONSENSUS: [WKMUVM]. 

NAME: GTP-binding nuclear protein ran signature. 
CONSENSUS: D-T-A-G-Q-E-K-[LF]-G-G-L-R-(DE]-G-Y«Y. 

NAME: SARI family signature. 

CONSENSUS: R-x-[LIVM]-E-V-F.M^S-{LIVM](2)-x-lKRQJ-x-G-Y.x-E-[AG]-[FI]-x-W-[LIVMl. 
CONSENSUS: x-Q-Y. 

NAME: Band 7 protein family signature. 

CONSENSUS: R-x(2)-[LIV]-[SANl-x(6)-[LIV].D-x(2)-T.x(2)-W-G-[LIV]-[KRH]-[LIV]-x- 
CONSENSUS: tKR]-[LIV]-E-[LIV]-[KR]. 

NAME: Trp-Asp (WD) repeats signature. 

CONSENSUS: [LJVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)-[DN]-x(2). 
CONSENSUS: [UVM WSTAC].x-{LIVMFSTAG]- W-[DEN]-{LIVMFSTAGCN} . 

NAME: G-protein gamma subunit profile. 

NAME: Ras GTPase-acu vating proteins signature. 
CONSENSUS: [GSl^-x-[UVMF]-[FY]-[U\MFY]-R-[^^ 
CONSENSUS: [SGAN]-P. 

NAME: Ras GTPase-acuvating proteins profile. 

NAME: Guanine-nucleotide dissociation stimulators CDC24 family signature. 

CONSENSUS: L-x(2)-[LIVMFfW]-L-x(2)-P-lLIVM]-x{2)-[UVM]-x-[KRSJ-x(2>-L-x-[LIVMl-x- 

CONSENSUS: [DEQHLIVM]-x(3MST]. 

NAME: Guanine-nucleotide dissociation stimulators CDC25 family signature. 
CONSENSUS: [GAP].[CTJ-V-P-[FY]-x(4)-(LIVMFYl.x-{DN]-[LIVM). 

NAME: MARC ICS family signature 1. 
CONSENSUS: G-Q-E«N«G-H-V-(KR]. 

NAME: MARCKS family phosphorylation site domain. 

CONSENSUS: E«T-P-K(5>-x«), 1 >-F-S-F-K-K-x-F-K-L.S-G-x-S-F-K-[KR]-[NS)-[KR]-K-E. 

NAME: Stathmin family signature 1. 

CONSENSUS: P-[KQ]-fKR](2)-[DE]-x-S-L-[EG]-E. 

NAME: Stathmin family signature 2. 
CONSENSUS: A-E-K-R-E-H-E-[KR)-E-V. 

NAME: GTP-binding elongation factors signature. 

CONSENSUS: D-0CRSTGANQr^]-x(3VE-[KRAQ]-x-[RKQD]-[Gq-[IVMK]-[ST]-[IVl-x(2)- 
CONSENSUS: (GSTACKRNQJ. 

NAME: Elongation factor 1 beta/beta '/delta chain signature \. 
CONSENSUS: [DEMDEGHDE](2^[L!VMFJ-D-L-F-G. 

NAME: Elongation factor 1 beta/beta '/delta chain signature 2. 
CONSENSUS: V-(^S-x-D-[UVM]-x-A-[FWM]-[NQ]-K-{LIVMl. 

NAME: Elongation factor 1 gamma chain profile. 

NAME: Elongation factor Ts signature 1. 
CONSENSUS: L-R-x(2)-T-[GDQ]-x-[GS]-[UVM^ 
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NAME: Elongation factor Ts signature 2. 

CONSENSUS: E-[LIVM]-N-[SCV]-[QE]-T-D-F-V-(SA1-[KRN1. 
NAME: Elongation factor P signature. 

CONSENSUS: K-x-A-x(4)-G-x(2)-[LIV]-x-V-P-x(2MLIV]-x(2)-G. 
NAME: Eukaryotic initiation factor 1A signature. 

CONSENSUS: [IM]-x^x-[GS]-[KRHl-x(4)-[CLJ-x-D-G-x(2)-R-x(2)-[RHl-l-x-G. 
NAME: Eukaryotic initiation factor 4E signature. 

CONSENSUS: tDE]-{lFYl-x(2)-F-[KR]-x(2)-(LIVMl-x-P-x«W-E-p\n-x(5)^.G-[KR]-W. 

NAME: Eukaryotic initiation factor 5A hypusine signature. 
CONSENSUS: [PTJ-G-K-H-G-x-A-K. 

NAME: Initiation factor 2 signature. 

CONSENSUS: G-x-[UVM]-x(2)-L-[KR]-[KRHNS]-x-K-x(5)-[LIVM)-x(2)-G.x.[DEN]-C-G. 
NAME: Initiation factor 3 signature. 

CONSENSUS: [KR]-[LIVM](2MDNMr^-[GSNHKRHL^ 

NAME: Translation initiation factor SUM signature. 

CONSENSUS: [UVM]-[EQ]-(LIVM]-Q-G-[DEN]-[KHQJ.(KRV]. 

NAME: Prokaryotic-cype class I peptide chain release factors signature. 
CONSENSUS: [AR]-[STA]-x-G-x-G-G-Q-tHNGCS3-V-N-x(3).[ST]-A-[lV]. 

NAME: Transcription termination factor nusG signature. 
CONSENSUS: [LIVMhF-G-[KRW]-x-T-P-[IVl-x-[LIVM]. 

NAME: Calponin family repeat. 

CONSENSUS: lLIVM]-x-[l^]«Q-[MAS]-G-[STrl-[NTMK^^ 

NAME: CAP protein signature 1. 

CONSENSUS: [UVM](2)-x-R-L-[DE]-x(4)-R-L-E. 

NAME: CAP protein signature 2. 

CONSENSUS: D-[LIVMFYl-x-E-x-[PA]-x.p.E-Q«[LIVMFn-K. 
NAME: Calreticulin family signature 1. 

CONSENSUS: (!CRHN]-x.[DEQ^fl-[DEQNK].x(3K-G^-[AG^^FY].[LIVMl.[K^^.[LIVMFY](2). 

NAME: Calreticulin family signature 2. 
CONSENSUS: [LIVM](2)-F-G-P-D-x-C-[AGl . 

NAME: Calreticulin family repeated motif signature. 

CONSENSUS: [IV]-x-r>x-[DENST]-x(2)-K-P-lDEHJ-D-W-[DEN]. 

NAME: Calsequestrin signature 1 . 

CONSENSUS: [EQHDE]-CrL-[DN]-F-P-x-Y-D-G-x-D-R-V. 
NAME: Calsequestrin signature 2. 

CONSENSUS: [DE]-L-E-D-W-|LIVM]-E-D-V-L-x-G-x-[LIVM]-N-T-E«D-D-D. 

NAME: S-100/ICaBP type calcium binding protein signature. 
CONSENSUS: [UVMFYW](2>x(2MLK]-D-x(3HDN]-x(3>^ 
CONSENSUS: [LIVMFS]-[UVMF]. 

NAME: Hemolysin- type calcium-binding region signature. 
CONSENSUS: I>x-[LI]-x<4)-G-x-D-x-[U]-x-G-G-x(3)-D. 

NAME: HlyD family secretion proteins signature. 

CONSENSUS: [UVM]-xa)-G-[LM]«x(3MSTGAV]«x-[U^^ 

CONSENSUS: (LIVMFYW](2)-x-[UVMFYW]<3). 

NAME: P-H protein urydylation site. 
CONSENSUS: Y-[KR]-G-[ASHAE]-Y. 

NAME: P-II protein C -terminal region signature. 
CONSENSUS: lST>x(3KHDY>G-pCR]-[IVMF^ 

NAME: 14-3-3 proteins signature 1. 

CONSENSUS: R-N-L-[UV]-S-[VGHGA]-Y-[KN]-N-[IVA]. 
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NAME: 14-3-3 proteins signature 2. 

CONSENSUS: Y-K-(DEl-S-T-L-l-[IM]-(^L-tLI^-[RHq-D-N-[Ln.T.[I^]-W-rrA^-lSADJ^ 

NAME: ATP1G1 / PLM / MAT8 family signature. 

CONSENSUS: [DNS]-x-F-x-Y-D-x(2MST]-[UVMHRQ]-x(2)-G. 

NAME: BTG1 family signature 1. 

CONSENSUS: Y-x(2>.[HP3-W.[FYl.[AP]-E.x-P-x-K-G-x-[GA]-[^^-R^-[IV^[lml-[IV]. 
NAME: BTG1 family signature 2. 

CONSENSUS: [LVJ.P-x-tDEl-[LM]-[STl-[LrVM]-W.[IV].r>P-x-E.V-[SC]-x-[RQ].x.G-E. 
NAME: Cullin family signature. 

CONSENSUS: [LIVl-K-x(2)-{LIV]-x(2)-H-lDEQ]-[KRHNQ]-x-Y-[LrVM]-x-R-x(6.7)-rFYl-x- 
CONSENSUS: Y-x-[SA]>. 

NAME: Cullin family profile. 

NAME: Enhancer of rudimentary signature. 

CONSENSUS: Y-D-I-[SAl-x-L-[FYl-x-F-[IVl-I>-x(3)-D-[UV3-S. 
NAME: G10 protein signature 1. 

CONSENSUS: L-C-C-x-[KR]-C-x(4MDE]-x-N-x(4)-C-x-C-R-V-P. 

NAME: G 10 protein signature 2. 

CONSENSUS: C-x-H^-G-C-lKRHJ-G-C-lSAJ . 

NAME: Glucokinase regulatory protein family signature. 

CONSENSUS: G-[PA]-E-x-[LIV]-[STA]-G-S-[ST]-R-[LIVM]-K-(STGA](3)-x(2)-K. 
NAME: GTP1/OBG family signature. 

CONSENSUS: D-[LIVM]-P-G-[LIVM](2)-rpEYl.[GN].A-x(2)-G-x-G. 
NAME: HIT family signature. 

CONSENSUS: rNQA]-x(4HGAV]-x-[QF]-x-[LIVM]-x-H-[LW^ 
CONSENSUS: [PSGAj. 

NAME: Caseins alpha/beta signature. 
CONSENSUS: C.L-[LV]-A-x-A«[LVF]-A. 

NAME: Clathrin adaptor complexes medium chain signature 1. 

CONSENSUS: [WTl-[GSP]-W-R-x(2.3HGAD]-x(2)-[HYl-x(2)-N-x-tUVMAFY](3)-D-[LrVM]- 
CONSENSUS: [LIVMT]-E. 

NAME: Clathrin adaptor complexes medium chain signature 2. 
CONSENSUS: [UV]-x-F-l-P-P-x-G-x-[LIVMFn-x-L-x(2)-Y. 

NAME: Clathrin adaptor complexes small chain signature. 
CONSENSUS: [LIVM](2)-Y-[KR]-x(4)-L-Y-F. 

NAME: Ependymins signature I. 

CONSENSUS: F-E-E^}-x-[LIVMF]-Y-[ED]-M>-x(2>-N-(QE)-S-C-[RKH](2). 
NAME: Ependymins signature 2. 

CONSENSUS: [QE}«[LIVMA]-F-x(2)-P-[STA]-{FYl-C-(DE]-[GA]-[LIVMl-x(2)-[DEl(2). 
NAME: Syntaxin / epimorphin family signature. 

CONSENSUS: [RQ]-x(3)-fLIVMAl-xa)^IVM]-fESrfl-x(2MLIV^ 
CONSENSUS: [LJVM]-[re}-x(2)-[LIVM]-x(3>[LIV^ 
CONSENSUS: [UVMF]-[DESV>x(2)-fUVM]. 

NAME: Extracellular proteins SCP/Tpx«l/Ag5/PR-l /Sc7 signature t . 
CONSENSUS: [GDER]-H-[FYWH]-T-Q-[UVM](2)-W-x(2)-[STNl. 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 2. 
CONSENSUS: [UVMr^-fUVMFn-x-C-[NQRHS>^ 

NAME: Fetuin family signature I. 

CONSENSUS: C-x(56>C-x( 10)-C-x(13)-C-x( 17, 18>-C-x< 13)-C-x(2>-C-x<58K-x< 10. 1 1)- 
CONSENSUS: C-x(10.12)-C-x(16,22)-C 

NAME: Fetuin family signature 2. 
CONSENSUS: L-E-T«x-C-H-x-L-D-P-T-P. 
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NAME: Legume lectins beta-chain signature. 
CONSENSUS: [UVHSTAG]-V-[DEQVHFU]-D-[ST]. 

NAME: Legume lectins alpha-chain signature. 

CONSENSUS: [LIV].x-[EDQJ-[FYWKR]-V.x.[LIV]-G-[LF)-[ST3. 

NAME: Vertebrate galactoside-binding lectin signature. 

CONSENSUS: W-[GEK]-x-[E(^-x-{KRE]-x(3,6HPCTF3-[LIVMF]-[NQEGSKV]-x-[GH]-x(3)- 
CONSENSUS: [DENKHS]-[LIVMFCJ. 

NAME: Lysosome-associated membrane glycoproteins duplicated domain signature. 
CONSENSUS: [STA]^-[LIVMl-[LlVMFYW]-A.x.(LIVMFYW]-x(3)-[LIVMFYW]-x(3)-Y. 

NAME: LAMP glycoproteins transmembrane and cytoplasmic domain signature. 
CONSENSUS: C-x(2)-D-xa4MLIVM](2)-P-[UVM]-x-[l^ 

CONSENSUS: x-[LIVM](4)-A-[Fr>x-(LIVM]-x(2HKRHRH]-x< 1 ,2)-[STAG](2)-Y-[EQ]. 

NAME: Glycophorin A signature. 

CONSENSUS: I-I-x-[GAq-V-M-A-G-[UVM](2). 

NAME: PMP-22 / EMP / MP20 family signature I . 

CONSENSUS: [UVMFl(4)-[SA]-T-x(2HDNKS]-x-W.x(9,13>[LIVJ-W.xaK. 
NAME: PMP-22 / EMP / MP20 family signature 2. 

CONSENSUS: (RQ]-[AV]-x-M-[IVl-L-S-x-{LI]-x(4)-[GSA]-[LIVMF](3). 

NAME: Oxysterol-binding protein family signature. 
CONSENSUS: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF]-A. 

NAME: Yeast PIR proteins repeats signature. 

CONSENSUS: S-Q-[IV]-[STGNH]-D-G-Q-tLIVJ-Q-[AIVl-[STA] . 

NAME: Seminal vesicle protein I repeats signature. 

CONSENSUS: [IVM]-x-G-Q-D-x.V-K-x(5HKN]-G-x(3V[STLV]. 

NAME: Seminal vesicle protein II repeats signature. 
CONSENSUS: [GSA]-Q-x-K-S-[FY]-x-Q-x-K-[SA] . 

NAME: Serum amyloid A proteins signature. 

CONSENSUS: A-R-G-N-Y-[ED]-A-x-[QKR]-R-G-x-G-G-x-W-A. 

NAME: Spermadhesins family signature I . 
CONSENSUS: C-G-x(2)-[U]-x(4)-G-x-I-x(9)-C-x-W-T. 

NAME: Spermadhesins family signature 2. 

CONSENSUS: C-x-K-E-x-fLIVM}-E-[LIVM]-x-[DE]-x(3)-[GS}-x(5)-K-x^:. 

NAME: Stress-induced proteins SRPl/TlPl family signature. 
CONSENSUS: P-W-Y-[ST](2)-R-L. 

NAME: Glypicans signature. 

CONSENSUS: C-x(2)-C-x-G-[LIVM]-x(4)-P<:.x(2HFY3-C-x(2)-[LIVMJ-x(2)-G-C. 
NAME: Syndecans signature. 

CONSENSUS: [FY]-R-[IM]-[KR)-K(2)-D-E-G-S-Y. 
NAME: Tissue factor signature. 

CONSENSUS: W-K-x-K-C-x(2)-T-x-[DENl-T-E-C-D-[LIVMl-T-I>-E. 
NAME: Translationally controlled tumor protein signature 1. 

CONSENSUS: [IA]-G-[GAS)-N-(PA}-S-A.E.[GDE]-[PAGE]-x(0 > lHDEG]-x-[DEN]-x(2)-(DE]. 

NAME: Translationally controlled tumor protein signature 2. 
CONSENSUS: [FLHI^-[IVT]-G-E-x-[MA]-x(2,5HDE^^^ 
CONSENSUS: [DE). 

NAME: Tub family signature t. 

CONSENSUS: F-[KHQ]-G-R-V-[ST|-x-A-S-V-K-N-F-Q. 
NAME: Tub family signature 2. 

CONSENSUS: A-F-{AGH-ISAC]-[LIVM3{ST>S-F-x.[GST]-K-x-A-C.E. 

NAME: HCP repeats signature. 
CONSENSUS: H-R-H-R-G-H-x(2HDEl<7). 
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NAME: Bacterial ice-nucleauon proteins octamer repeat. 
CONSENSUS: A-G-Y-G-S-T-x-T. 

NAME: Cell cycle proteins ftsW / rodA / spoVE signature. 
CONSENSUS: [NV]-x(5MGTRHLIVMA]-x-P-[PTU^ 
CONSENSUS: G-G-lSTNMSAK 

NAME: Enterobacterial virulence outer membrane protein signature 1 . 
CONSENSUS: G-[LIVMFY]-N-(LIVM]-K-Y-R-Y-E. 

NAME: Enterobacterial virulence outer membrane protein signature 2. 
CONSENSUS: [FYW]-x(2)-G-x-G- Y-[KR]-F > . 

NAME: Hydrogenases expression/synthesis hypA family signature. 

CONSENSUS: F-[CSA]-[FYl-[DEl-tLIVAl(2).x(3)-fSTMLlVMl-x(16)-C-x(2)-C-Ji( 12,15)- 
CONSENSUS: C-P-x-C. 

NAME: Hydrogenases expression/synthesis hupF/hypC family signature. 
CONSENSUS: < M-C-[LIV]-[GA]-[UV]-P-x-(QKR].[LIVl. 

NAME: Staphylocoagulase repeat signature. 

CONSENSUS: A.R.P-x(3)-K-x-S-x-T-N-A-Y-N-V-T-T-x(2)-[DN]-G-x(3)-Y-G. 

NAME: 11 -S plant seed storage proteins signature. 

CONSENSUS: N-G.x-[DE](2)-x.[UVMF]-C-[STl-x(11.12)-[PAG]-D. 

NAME: Dehydrins signature 1. 

CONSENSUS: S<5)-[DE]-x-[DE]-G-x( 1 ,2)-G-x(0, 1>-(KR1(4). 

NAME: Dehydrins signature 2. 

CONSENSUS: [KRHUM]-K-[DE]-K-[LIM]-P-G. 

NAME: Germin family signature. 

CONSENSUS: G-x(4)-H-x-H-P-x-A-x-E-[LIVMl. 

NAME: Olcosins signature. 

CONSENSUS: [AGJ-lSTJ-x(2HAG]-x(2HLIVMMSAD]-T-P-[LrVMFl(4)-F-S-P-[LIVMJ(3)- 
CONSENSUS: P-A. 

NAME: Small hydrophitic plant seed proteins signature. 
CONSENSUS: G-[EQ]-T-V-V-P-G-G-T. 

NAME: Pathogenesis- related proteins Betvl family signature. 

CONSENSUS: G-x(2)-[LIVMF].x(4>-E-x(2MCSTAEN]-x(8,9)-[GND]-G-[GSl-rCS]-x(2).K.x(4)- 
CONSENSUS: [FY]. 

NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: [EQ]-G-x-V«Y-C-D-T-C-R. 

NAME: Thaumatin family signature. 

CONSENSUS: G-x-[GF]-x-C-A-T-[GAJ-r>C-x(l,2>G.x(2,3)-C. 
NAME: Mrp family signature. 

CONSENSUS: W.x(2)-[UVM]-D-[UVMYl{4>-D-x.P-P-G-T-lGS]-D. 

NAME: Glucose inhibited division protein A family signature 1 . 
CONSENSUS: lGS]-P-x-Y-CP-S-[LIVM]-E-x-K-lLIVMl.x-[KRl-F. 

NAME: Glucose inhibited division protein A family signature 2. 
CONSENSUS: A.G^x-[NT]^x(2)-G.Y-x-E-[SAG](3MQS]^-[U^ 

NAME: NOLl/NOP2/sun family signature. 

CONSENSUS: rFV)-D-[KRA]-[LIVMA]-L-x-r>[AV]-P-C-[STl-[GA]. 
NAME: PET1 12 family signature. 

CONSENSUS: [DN]-x-lDN]-R-x(3)-P-L.[LIV].E-{UV].x-[STl-x-P. 
NAME: Protein smpB signature. 

CONSENSUS: lTA)-G-(LIVM]-x-L-x^-x-E4LIVMl-[KQ]-[SA]-[LIVM]. 

NAME: Hypothetical cof family signature 1. 
CONSENSUS: [UVFr-AN]-fUVMFAKx(2)-r>[LIVM^ 
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NAME: Hypothetical cof family signature 2. 

CONSENSUS: (LIVMFC].G-D-{GSAN(a-x-N.D-x(3)-(LIMFY)-x(2).[AV)^(2)-[GSCP3-x(2)- 
CONSENSUS: [LMPJ-x(2)-[GAS] . 

NAME: RIO1/ZK632.3/MJ0444 family signature. 
CONSENSUS: [UVM]-V-H-[GA]-D-L-S-E-[FYl-N-x-[LIVM]. 

NAME: SUA5/yciO/yrdC family signature. 

CONSENSUS: [UVMTA](3)-[LIVMFYC].(PG]-T-(DE]-[STA]-x-[FY]-[GA]-[L!VM].[GS]. 

NAME: Uncharacterized protein family UPF0001 signature. 
CONSENSUS: [FW].H-[FM]-[IV]-G-x-(LIV]-Q-x-[NKR]-K-x{3)*[LIVl . 

NAME: Uncharacterized protein family UPF0003 signature. 

CONSENSUS: G-x-V-x(2)-(LIV]-x(3)-[SA]-x(6>.D-x(3V(LIVTl(3)-P-N-x(2)-fLIVMF](2)- 
CONSENSUS: x(5)-N. 

NAME: Uncharacterized protein family UPF00O4 signature. 

CONSENSUS: [UVMJ-x-lLIVNni-x(2)-G-C.x(3)^-[STAN]-{I^3^-x-[LIVM]-x(4)-G. 
NAME: Uncharacterized protein family UPF0005 signature. 

CONSENSUS: G-[LIVM](2HSA]-x(5 t 8)-G-x(2HLIVM]-G-P-x-L-x(4)-[SAGl-x(4.6)- 
CONSENSUS: [UVMl(2)-x(2)-A-x(3)-T-A-(UVMl(2>F. 

NAME: Uncharacterized protein family UPF0006 signature 1. 
CONSENSUS: (LIVMFY](2VD'[STA]-H-x-H-(LIVMF]-[DNl. 

NAME: Uncharacterized protein family UPP0006 signature 2. 
CONSENSUS: P.tUVM]-x-(UVMl-H.x-R-x^[TA]-x-[DE]. 

NAME: Uncharacterized protein family UPF0006 signature 3. 

CONSENSUS: [LVSA]-[UVAl-x(2)-|LIVM}.[PS].x(3VL"[LIVM].[LIVMSJ-E-T-D-x-P. 

NAME: Uncharacterized protein family UPF0007 signature. 
CONSENSUS: V-L-OVJ-H-D-IGAJ-A-R. 

NAME: Uncharacterized protein family UPF0OU signature. 
CONSENSUS: S-D-A-G-x-P-x-[UV]-lSN]-D-P-G. 

NAME: Uncharacterized protein family UPF0012 signature. 
CONSENSUS: [GTA]-x(2)-[IVTl^:-Y-D-[LIVM]-x-F-P-x(9)-G. 

NAME: Uncharacterized protein family UPF0015 signature. 

CONSENSUS: [DE]-[LIVMFl(3)-R-T-[SG]-G-x(2)-R-x-S-x-[FYl-[LIVM](2)-W-Q. 

NAME: Uncharacterized protein family UPF0016 signature. 
CONSENSUS: E-P-IVM]-G-D-K-T-F-[LIVMF](2)-A. 

NAME: Uncharacterized protein family UP F00 17 signature. 

CONSENSUS: D-x(8)-[GN]-(LFY]-x(4)-[DET)-[LY]-Y-x(3)-[ST]-x(7MIVl-xa)-[PSl-x- 
CONSENSUS: [LIVM]-x-[LIVM]-x(3MD^D. 

NAME: Uncharacterized protein family UPF0019 signature. 

CONSENSUS: L-P-V-{VT]-[NQL]-F-[AT]-A-G-G-(LIV]-A-T-P-A-D-A-A-[LM]. 

NAME: Uncharacterized protein family UPF0020 signature. 
CONSENSUS: D-P-[LIVMF]-C-G-tSTl-G-x(3)-[U]-E. 

NAME: Uncharacterized protein family UPF0021 signature. 
CONSENSUS: C-K-x(2>F-x(4)-E-x(22,23)-S-G-G-K-D. 

NAME: Uncharacterized protein family UPF0O23 signature. 
CONSENSUS: D-x-D-E-[LIV].L-x(4)-V-F-x(3)-S-K-G. 

NAME: Uncharacterized protein family UPF0024 signature. 
CONSENSUS: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[UVF)-(SGC]. 

NAME: Uncharacterized protein family UPF0025 signature. 
CONSENSUS: D^[UV]-x(2)^H-[ST)-H-x( 1 2MUVMFJ-N-P-G. 

NAME: Uncharacterized protein family UPF0027 signature. 
CONSENSUS: Q4UVM]-x-N-x-A-x-[UVM]-P-x-I-x(6V 

NAME: Uncharacterized protein family UPF0028 signature. 
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CONSENSUS: [GA]-[GSJ-G-[GA3-A-R-G-x.[SA].H-x-G-x(9)-tIV)-x-[IVl.D-x(2).tGA]-G^-S. 
CONSENSUS: x-G. 

NAME: Uncharacterized protein family UPF0029 signature. 

CONSENSUS: G-x(2HUVM](2^x(2)-[UVM]-x(4>-[LIVM].x(5).[LIVM](2)^.R.{FYWJ(2H3- 
CONSENSUS: G-x(2MLIVM]-G. 

NAME: Uncharacterized protein family UPF0030 signature. 
CONSENSUS: [GA]-L-I-[LIV]-P-G-G-E-S-T-ISTA]. 

NAME: Uncharacterized protein family UPF0031 signature 1. 

CONSENSUS: [SAV]-[IVW]-(LVAJ-{LIV]-G-[PNS]^-L.CGP]-x-PENQT]. 

NAME: Uncharacterized protein family UPF0031 signature 2. 
CONSENSUS: [GA]-G-x-G-I>|TV]-[LT3-(STA]-G-x-[LIVMl. 

NAME: Uncharacterized protein family UPF0032 signature. 

CONSENSUS: Y-x(2)-F-[UVMA](2)-x-L-x(4)-G-x(2)-F-EEQ]-(LIVMF]-PHLIVMl. 

NAME: Uncharacterized protein family UPF0033 signature. 
CONSENSUS: L-{DN]-x(2)-[TAGl-x(2)-C-P«x.P-x-[LrArM]. 

NAME: Uncharacterized protein family UPFD034 signature. 
CONSENSUS: [UVMHDNGMLIVM]-N-x-G^-P-x(3^ 

NAME: Uncharacterized protein family UPF0035 signature. 
CONSENSUS: L-L-T-x-R-(SA]-x(3HM3)-G-x(3)-F-P-G-G. 

NAME: Uncharacterized protein family UPF0036 signature. 

CONSENSUS: H-x-S-G-H-[GA]-x(3)-[DE].x(3)-[LMl-x(5)-P-x(3)-CLIVMl-P-x.H-G-[DE). 

NAME: Uncharacterized protein family UPF0038 signature. 
CONSENSUS: G-x-[LH-x-R-x(2)-L-x(4)-F-x(8)-[LIV]-x(5)-P-x.[LIV]. 

NAME: Uncharacterized protein family UPF0044 signature. 

CONSENSUS: L-[ST]-x(3VK-x(3HKRllSGA]-x-[GA]-H-x-L-x-P-[LIV]-x(2).[LrV]-[GAl- 
CONSENSUS: x(2)-G. 

NAME: Uncharacterized protein family UPF0O47 signature. 
CONSENSUS: S-X(2>[LIV]-x-[UV]-x(2)-G.x(4)-G-T-W^-x-rLIV]. 

NAME: Uncharacterized protein family UPF0054 signature. 
CONSENSUS: H-[GS]-x-L-H-L-CLn-G-[FYW1-D-H. 

NAME: Uncharacterized protein family UPF0057 signature. 

CONSENSUS: rUV].x-(STA].rLIVFl(3)-P.P-|UVAl-rGA]-[IV]-x(4).lGKN]. 

NAME: Hypothetical YER057c/yjjV family signature. 

CONSENSUS: . P-{ATJ-R-lSA]-x-[LIVMY]-x(2)-[AKJ-x-L-P-x(4)-[LrVM]-E. 

NAME: Hypothetical hesB/yadR/yfhF family signature. 

CONSENSUS: F-x-[LIVMFY]-x-N-[PG]-(NSK]-x(4K:-x-C-[GS].x.S-F. 

NAME: Hypothetical yabO/yceC/sfhB family signature. 

CONSENSUS: [>mY]-R-[LI]-D-x(2)-T-[ST]-G-rLIVMAl-[LIVMF](2HLrVMFGJ-[SGAq. 
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We claim : 

1 . An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; bibr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; 
hfbr2_2b5; hibr2_2cl; bibr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62U9; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hibr2_72dl3; hfbr2_72U2; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; bibrl_10el7; hfbr2_82e4;; bfbrl_10e4; hfbr2_82gl4;; 
hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; hfbr2_82i24;; hibrl_10; hfbr2_82ml6;; hfbrl_10; 
hfbr2_82m6;; hfbrl_10; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; 
hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; 
hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; 
hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfl_lall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
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htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

2. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; 
hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; 
hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; bibr2_2il7; hfbr2_2kl4; hfbr2_2kl9; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; bibr2_72112; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; bfbr2_82e4; bfbrl_10e4; hfbr2_82gl4; 
hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrl_10; hfbr2_82ml6; hfbrl_10; 
hfbr2_82m6; hfbrllO; their complements; and variants thereof. 

3. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_l 6f2 1 ; hfbr2_16k22; 
hfbr2_22f21; hfbr2_22hl3; bibr2_22i4; hfbr2_22k3; bibr2_22k8; hfbr2_23£2; ; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; hibr2_2gl8; hfbr2_2hl; 
hfbr2_2hl0; hibr2_2kl9; hfbr2_3fl6; bibr2_312; hfbr2_62nl0; hfbr2_64al 1; hfbr2_64cl6; 
bibr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64ol6; hfbr2_6al7; hibr2_6i20; hfbr2_71o20; 
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hfbr2_72dl3; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; 
h£br2_7e22; hfbr2_7j4; hfbr2_82ml6; and hfbrl_10. 



4. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24al5; 
hfkd2_24bl5; hfkd2_24e23; hflcd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; 
hfled2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; their complements; and 
variants thereof. 

5. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24e23; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; their complements; and 
variants thereof. 

6. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lall; hmcfl_lc23; 
hmcfl_lel5; hmcfl_lgl3; their complements; and variants thereof. 

7. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lc23 hmcfl_lgl3; their 
complements; and variants thereof. 

8. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hhtes3_ln3; htes3_14g5; 
htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; htes3_15c6; 
htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; 
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htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3J72kll; 
Htes3_72kl5; htes3_72pl6; htes3J7b22; htes3J7dl7; htes3_7j3; htes3 Jj8; htes3_7pl0; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

9. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: htes3_14g5; htes3_14pl4; 
htes3J4p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8; htes3_17fl0; Htes3J8f3; 
htes3_19fl9; htes3_19jl7; htes3_20c21; htes3_21n23; htes3_22c23; htes3_22nl3; 
Htes3_23nl9; htes3_27ol4; htes3_28dl4; htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; 
htes3_2hl5; htes3_2119; htes3_2m20; htes3_2n9; htes3 J0f4; htes3_35g6; htes3_35n24; 
htes3_35pl7; htes3_4b4; htes3_4fl7; htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; 
htes3_6b21; htes3_6dl6; htes3_72kl 1; htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

10. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16gl8; hfbr2_2kl4; 
Htes3_35b4; htes3_35p22; htes3J7j3; htes3_7pl0; hutel_20mll; their complements; and 
variants thereof. 

11. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_2b5; 
htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3_7b22; hutelJ9g22; hutel_24j6; 
their complements; and variants thereof. 

12. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_2dl5; htes3_35e21; 
hutel_2h3; their complements; and variants thereof. 

13. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23124; hfbr2_2il7; 
hfbr2_41ml5; hfbr2_62fl0; hfbr2_62119; hfbr2_64jl8; 
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hfkd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21U6; htes3_23111; 
htes3_26g22; htes3_4h6; htes3_72pl6; hutel J9hl7; hutel _20hl3; hutel_24ell; their 
complements; and variants thereof. 

14. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_3g8; hfbr2_62ol7; 
hfbr2_6b24; hfbr2_78k24; hfkd2_24bl5; hfkd23ol7; hfkd2_46j20; htes3_17U7; 
htes3_17nl8; htes3_27dl; htes3_2al7; htes3J5b5; htes3_35kl6; htes3_35nl2; 
htes3_35n9; hutel_20bl9; hutel_20m24; hutel J23el3; their complements; and variants 
thereof. 

15. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23bl0; hfbr2_3cl8; 
hfbr2_64al5; hfbr2_6ol7; hfbr2J72M8; hfbr2_72U2; hfbr2_82i24(hfbrl_10); 
htes3_14h21; Htes3J5j3; htes3_20ml8; htes3_22g2; htes3_2ml8; htes3J7p9; 
htes3_8ml0; hutel_18U; their complements; and variants thereof. 

16. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23b21; hfbr2_23nl6; 
hfbr2_2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2_82e4 (hfbrl_10e4); hfbr2_82il7 
(hfbrl JO); hfbr2_82m6 (hfbrl_10);_hfkd2_46m4; htes3J5kll; htes3_lcl; hhtes3Jn3; 
htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; htes3_8e24; hutel_20g21; 
hutel_22d2; hutel_22el2; their complements; and variants thereof. 

17. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16il2; hfbr2_16112; 
hfbr2_22M3; hfbr2 _2bl7; hfbr2_2dl7; hfbr2_64k24; hfbr2_82c20 (hfbrl_10c20); 
hfbr2_82el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl_10gl4); hfkd2_24al5; hfkd2_3il3; 
hfkd2_4mll; hmcfljall; hmcfl_lel5; htes3_15c6; htes3_2ol3; htes3_27k4; htes3_2hl; 
htes3_35k24; hutel_19fl9; and hutel_24cl9; their complements; and variants thereof. 

18. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_46kl9; hfkd2_47a4; 
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htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; hutel_li2; their complements; and 
variants thereof. 

19. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; hutel_19gl9; hutel_19g22; 
hutel_19hl7; hutel_19jll; hutel_li2; hutel_20M9; hutel_20g21; hutel_20hl3; 
hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; hutel_22el2; hutel_22n2; 
hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; hutel_24ell; hutel_24j6; 
hutel_2h3; their complements; and variants thereof. 

20. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18i4; hutel_19gl9; hutel_19jll; hutel_22n2; hutel_21dl5; hutel_22o2; 
hutel_23gl 1; their complements; and variants thereof. 

21 . A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; Irfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23blO; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hibr2_62bll; hibr2_62fl0; hfbr2_62U9; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; bibr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; blbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; bibr2_82e4;; 
hfbrl_10e4; hfbr2_82gl4;; hfbrl_l0gl4; hfbr2_82il7;; hfbrl_10; bibr2_82i24;; hfbrl_10; 
hfbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrlJO; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; 
hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; 
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hfkd2_46M0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; 
hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfljall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8mlO; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutelJ8cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

22. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; bfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hibr2_2cl8; 
hfbr2_2dl5; bibr2_2dl7; hfbr2_2d20; hibr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62U9; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 



1088 



WO 01/12659 PCT/IB00/01496 

hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; 
hfbrl_10e4; bfbr2_82gl4; hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrllO; 
hfbr2_82ml6; hfbrllO; hfbr2_82m6; hfbrllO; complements of the nucleic acid 
sequences; and variants thereof. 

23. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16f21; hfbr2_16k22; hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23f2; ; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; 
hibr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; 
hfbr2_64al 1; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64k24; 
hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; hfbr2_72dl3; hfbr2_72ml6; 
hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; 
hfbrl lO; complements of the nucleic acid sequences; and variants thereof. 

24. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46J20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; complements of the nucleic acid sequences; and variants thereof. 

25. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: hfkd2_lj9; 
hfkd2_24e23; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; 
complements of the nucleic acid sequences; and variants thereof. 

26. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hmcfl_lall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; complements of the nucleic acid 
sequences; and variants thereof. 

27. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hmcfl_lc23; hmcfl_lgl3; complements of the nucleic acid sequences; and variants thereof. 

28. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; 
Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; 
htes3_15kll; htes3_17fl0; htes3_17U7; htes3_17nl2; htes3_17nl8; Htes3_18f3; 
htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; 
htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; 
htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; 
htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; 
htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; 
htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; 
htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; 
htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; 
htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; 
htes3_6dl6; htes3_72kll; Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; 
htes3_7j8; htes3_7pl0; htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; complements of the nucleic acid 
sequences; and variants thereof. 

29. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: htes3_14g5; 
htes3_l 4pl4; htes3_14p7; htes3_l 5al 3 ; htes3_l 5gl4; htes3_l 5hl ; htes3_l 5j 1 8; 
htes3_17fl0; htes3_17nl8; Htes3_18f3; htes3_19fl9; htes3_19jl7; htes3_20c21; 
htes3_21n23; htes3_22c23; htes3_22nl3; Htes3_23nl9; htes3_27ol4; htes3_28dl4; 
htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; htes3_2hl5; htes3_2119; htes3_2m20; 
htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; htes3_35pl7; htes3_4b4; htes3_4f!7; 
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htes3jk>19; htes3_50j4; htes3_50n23; htes3_50n06; htes3_6b21; htes3_6dl6; htes3J72kll; 
htes3_7dl7; htes3J7j8; Htes3_8gl 1; Htes3_8g5; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; complements of the nucleic acid sequences; and variants thereof. 

30. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2J6gl8; hfbr2_2kl4; Htes3_35b4; htes3_35p22; htes3_7j3; htes3J7plO; 
hutel_20mll; complements of the nucleic acid sequences; and variants thereof. 

31. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_2b5; htes3_15i5; htes3J817; htes3_lkll; Htes3_72kl5; htes3_7b22; 
hutel_19g22; hutel_24j6; complements of the nucleic acid sequences; and variants thereof. 

32. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_2dl5; htes3_35e21; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 

33. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23124; hfbr2_2il7; hfbr2_41ml5; hfbr2_62fl0; hfbr2_62U9; hfbr2_64jl8; 
hfkd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21U6; htes3_23111; 
htes3_26g22; htes3_4h6; htes3J72pl6; hutel_19hl7; hutel^20hl3; hutel 24ell; 
complements of the nucleic acid sequences; and variants thereof. 

34. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_3g8; hfbr2_62ol7; hfbr2_6b24; hfbr2J78k24; hfkd2_24bl5; hfkd2Jol7; 
hfkd2_46j20; htes3J7U7; Htes3J7nl8; htes3_27dl; htes3_2al7; htes3J5b5; 

htes3 J5kl6; htes3J5nl2; htes3J5n9; hutel^20bl9; hutel^20m24; hutel J3el3; 
complements of the nucleic acid sequences; and variants thereof. 

35. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hfbr2_23M0; hfbr2_3cl8; hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2J72U2; 
hfbr2J2i24(hfbrl_10)i.htes3_14h21; Htes3_15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; 
htes3_7p9; htes3_8ml0; hutel_1811; complements of the nucleic acid sequences; and 
variants thereof. 

36. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23b21; hfbr2_23nl6; hfbr2_2cl7; hfbr2_62bll; hfbr2J78c24; hfbr2J2e4 
(hfbrl_10e4); hfbr2J2il7 (hfbrl_10); hfbr2_82m6 (hfbrl J0);_hfkd2_46m4; htes3_15kll; 
htes3Jcl; hhtes3_ln3; htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; 
htes3_8e24; hutel_20g21; hutel_22d2; hutel_22el2; complements of the nucleic acid 
sequences; and variants thereof. 

37. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16il2; hfbr2_16112; hfbr2_22hl3; hfbr2_2M7; hfbr2_2dl7; hfbr2_64k24; 
hfbr2 _82c20 (hfbrl_10c20); hfbr2 J2el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl_10gl4); 
hfkd2_24al5; hfkd2_3il3; hfkd2_4mll; hmcfljall; hmcfl_lel5; htes3_15c6; 
htes3_2ol3; htes3_27k4; htes3_2hl; htes3 J5k24; hutel J9fl9; and hutel_24cl9; 
complements of the nucleic acid sequences; and variants thereof. 

38. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_46kl9; hfkd2_47a4; htes3_2el2; htes3_21jl5; htes3_17nl2; hutel J8il9; 
hutel_li2; complements of the nucleic acid sequences; and variants thereof. 

39. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel J7k7; hutel J8cl2; hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; 
hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; hutel_20bl9; 
hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; 
hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; 
hutel_24ell; hutel_24j6; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 
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40. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel_18i4; hutel_19gl9; hutel_19jll; hutel_22n2; 
hutel_21dl5; hutel_22o2; hutel_23gl 1 ; complements of the nucleic acid sequences; and 
variants thereof. 

41 . A nucleic acid molecule having the sequence of a clone selected from the 
group consisting of hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; 
hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; bibr2_22k3; 
hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; 
hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; 
hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; 
hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; 
hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; 
hfbr2_62ol7; hfbr2_64all; bibr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; 
hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; 
hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72U2; 
hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; 
hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; bibr2_82c20; hfbrl_10c20; hfbr2_82el7; 
hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; 
hfbrl_10; hfbr2_82i24;; hfbrl_10; hfbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrl_10; 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; hmcfljall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; hhtes3_ln3; 
htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; 
htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17U7; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
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htes3_2m20; htes3_2n9; htes3 2ol3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3J5e21; 
htes3 J5g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3 J5n24; htes3_35n9; 
htes3 J5pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3J0n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; 
Htes3J72kl5; htes3_72pl6; htes3J7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; hutel_17k7; hutel_18cl2; hutel J8il9; hutel J8i4; hutel J811; 
hutel_19fl9; hutel J9gl9; hutel_19g22; hutel_19hl7; hutel J9jll; hutel Ji2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

42. A polypeptide encoded by the nucleic acid molecule according to claim 41. 

43. An antibody or fragment thereof that is capable of binding to a specific portion 
of the peptide according to claim 42. 

44. A pharmaceutical composition, comprising (a) an effective amount of a 
pharmaceutical agent, wherein said pharmaceutical agent is selected from the group consisting 
of the polypeptide according to claim 42, variants or functional derivatives thereof, and 
antibodies thereto; and (2) a physiologically acceptable carrier or excipient. 

45. An expression vector comprising the nucleic acid molecule of claim 41 or a 
fragment thereof, and optionally a promoter operably linked to said nucleic acid molecule or 
said fragment. 

46. A method for recombinantly producing a desired peptide, comprising expressing 
in a host cell a peptide encoded by the nucleic acid molecule according to claim 4 1 . 
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